[PoC] pg_upgrade: allow to upgrade publisher node

Started by Hayato Kuroda (Fujitsu)almost 3 years ago407 messages

kuroda.hayato@fujitsu.com

almost 3 years ago

1 attachment(s)

Dear hackers,
(CC: Amit and Julien)

This is a fork thread of Julien's thread, which allows to upgrade subscribers
without losing changes [1]/messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud.

I briefly implemented a prototype for allowing to upgrade publisher node.
IIUC the key lack was that replication slots used for logical replication could
not be copied to new node by pg_upgrade command, so this patch allows that.
This feature can be used when '--include-replication-slot' is specified. Also,
I added a small test for the typical case. It may be helpful to understand.

Pg_upgrade internally executes pg_dump for dumping a database object from the old.
This feature follows this, adds a new option '--slot-only' to pg_dump command.
When specified, it extracts needed info from old node and generate an SQL file
that executes pg_create_logical_replication_slot().

The notable deference from pre-existing is that restoring slots are done at the
different time. Currently pg_upgrade works with following steps:

...
1. dump schema from old nodes
2. do pg_resetwal several times to new node
3. restore schema to new node
4. do pg_resetwal again to new node
...

The probem is that if we create replication slots at step 3, the restart_lsn and
confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
pg_resetwal discards the WAL file. Such slots cannot extracting changes.
To handle the issue the resotring is seprarated into two phases. At the first phase
restoring is done at step 3, excepts replicatin slots. At the second phase
replication slots are restored at step 5, after doing pg_resetwal.

Before upgrading a publisher node, all the changes gerenated on publisher must
be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
of copied replication slots is same as current_wal_insert_lsn. New node resets
the information which WALs are really applied on subscriber and restart.
Basically it is not problematic because before shutting donw the publisher, its
walsender processes confirm all data is replicated. See WalSndDone() and related code.

Currently physical slots are ignored because this is out-of-scope for me.
I did not any analysis about it.

[1]: /messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-pg_upgrade-Add-include-replication-slot-option.patchapplication/octet-stream; name=0001-pg_upgrade-Add-include-replication-slot-option.patchDownload

From 3809abdaa4ecc351b7e9c52d6a8e751732a4f4f0 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH] pg_upgrade: Add --include-replication-slot option

This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--slot-only". This option
can be used to dump replication slots. When this option is specified, the slot_name,
plugin, and two_phase parameters are extracted from pg_replication_slots. An SQL
file is then generated which executes pg_create_logical_replication_slot() with
the extracted parameters.

For pg_upgrade, when '--include-replication-slot' is specified, it executes pg_dump
with added option and restore from the dump. Apart from restoring schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 148 +++++++++++++++++-
 src/bin/pg_dump/pg_dump.h                     |  15 +-
 src/bin/pg_dump/pg_dump_sort.c                |   4 +
 src/bin/pg_upgrade/dump.c                     |  22 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   5 +
 src/bin/pg_upgrade/pg_upgrade.c               |  64 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../pg_upgrade/t/003_logical_replication.pl   |  88 +++++++++++
 12 files changed, 370 insertions(+), 2 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 77299878e0..7525cb521a 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1201,6 +1201,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--slot-only</option></term>
+      <listitem>
+       <para>
+        Dump only replication slots, neither the schema (data definitions) nor
+        data. Mainly this is used for upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..39c9e607d4 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-replication-slot</option></term>
+      <listitem>
+       <para>
+        Transport replication slots. Currently this can work only for logical
+        slots, and temporary slots are ignored. Note that pg_upgrade does not
+        check the installation of plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..8a6f25cf2c 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			slot_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6abbcff683..484c7e961a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -327,6 +327,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getRepliactionSlots(Archive *fout);
+static void dumpReplicationSlot(Archive *fout,
+								const ReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -430,7 +433,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+		{"slot-only", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -656,6 +659,11 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump onlu replication slot(s) */
+				dopt.slot_only = true;
+				dopt.include_everything = false;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -713,6 +721,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.slot_only && dopt.dataOnly)
+		pg_fatal("options --replicatin-slots and -a/--data-only cannot be used together");
+	if (dopt.slot_only && dopt.schemaOnly)
+		pg_fatal("options --replicatin-slots and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -892,6 +905,15 @@ main(int argc, char **argv)
 	 */
 	collectRoleNames(fout);
 
+	/*
+	 * If dumping replication slots are request, dumping them and skip others.
+	 */
+	if (dopt.slot_only)
+	{
+		getRepliactionSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Now scan the database and create DumpableObject structs for all the
 	 * objects we intend to dump.
@@ -935,6 +957,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1129,6 +1153,7 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "                               use SET SESSION AUTHORIZATION commands instead of\n"
 			 "                               ALTER OWNER commands to set ownership\n"));
+	printf(_("  --slot-only                  dump only replication slots, no schema and data\n"));
 
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=DBNAME      database to dump\n"));
@@ -10251,6 +10276,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_REPICATION_SLOT:
+			dumpReplicationSlot(fout, (const ReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18226,6 +18254,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_REPICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18487,3 +18516,120 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getRepliactionSlots
+ *	  get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)
+{
+	PGresult   *res;
+	ReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 && !dopt->slot_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT r.slot_name, r.plugin, r.two_phase "
+						 "FROM pg_replication_slots r "
+						 "WHERE r.database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(ReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_REPICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/* FIXME: force dumping */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpReplicationSlot
+ *	  write down a script for pg_restore command
+ */
+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	PQExpBuffer query;
+	char *slotname;
+
+	if (!dopt->slot_only)
+		return;
+
+	slotname = pg_strdup(slotinfo->dobj.name);
+	query = createPQExpBuffer();
+
+	/*
+	 * XXX: For simplification, pg_create_logical_replication_slot() is used.
+	 * Is it sufficient?
+	 */
+	appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+					  slotname);
+	appendStringLiteralAH(query, slotinfo->plugin, fout);
+	appendPQExpBuffer(query, ", ");
+	appendStringLiteralAH(query, slotinfo->twophase, fout);
+	appendPQExpBuffer(query, ");");
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+	/* XXX: do we have to dump security label? */
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+		dumpComment(fout, "REPICATION SLOT", slotname,
+					NULL, NULL,
+					slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+	pfree(slotname);
+	destroyPQExpBuffer(query);
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..a27bff661b 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_REPICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,18 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The ReplicationSlotInfo struct is used to represent replication slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _ReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char *plugin;
+	char *slottype;
+	char *twophase;
+} ReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..8e1fc1fda5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -1497,6 +1497,10 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 			snprintf(buf, bufsize,
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
+		case DO_REPICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
 			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..aecd284b48 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,28 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump replicaiton slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring replication
+		 * slots and other objects. Replication slots, in particular, should
+		 * not be restored before executing the pg_resetwal command because it
+		 * will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name), DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --slot-only --quote-all-identifiers "
+							   "--binary-upgrade %s --file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..7f5d48b7e1 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..9897e706d7 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-replication-slot", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..0236cd18c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_replicaiton_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create replication slots if requested.
+	 *
+	 * XXX This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_replicaiton_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,53 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		char	   *opts;
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   opts,
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..82d7a89e24 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_FILE_MASK_FOR_SLOTS	"pg_upgrade_dump_%u_slots.custom"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_slots;	/* true -> dump and restore replication
+								 * slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication.pl b/src/bin/pg_upgrade/t/003_logical_replication.pl
new file mode 100644
index 0000000000..27b36bea5b
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication.pl
@@ -0,0 +1,88 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading publisher
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Create subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres',
+	"CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',        '-d', $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir, '-b', $bindir,
+		'-B',         $bindir,         '-s', $new_publisher->host,
+		'-p',         $old_publisher->port,     '-P', $new_publisher->port,
+		$mode, '--include-replication-slot'
+	],
+	'run of pg_upgrade for new publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+my $result =
+  $new_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1),
+	'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on new publisher are shipped to subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))"
+);
+
+$new_publisher->wait_for_catchup('sub');
+
+$result =
+  $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20),
+	'check changes are shipped to subscriber');
+
+done_testing();
-- 
2.27.0

Peter Smith

smithpb2250@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#1)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

This is a WIP review. I'm yet to do more testing and more study of the
POC patch's design.

While reading the code I kept a local list of my review comments.
Meanwhile, there is a long weekend coming up here, so I thought it
would be better to pass these to you now rather than next week in case
you want to address them.

======
General

1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.

pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"

help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.

~~~

2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.

======
Commit message

3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

"new option" --> "new pg_upgrade" option

~~~

4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"

4b.
Typo: /replicaiton/replication/

4c
"leads an ERROR" --> "leads to an ERROR"

======

doc/src/sgml/ref/pg_dump.sgml

5.
+     <varlistentry>
+      <term><option>--slot-only</option></term>
+      <listitem>
+       <para>
+        Dump only replication slots, neither the schema (data definitions) nor
+        data. Mainly this is used for upgrading nodes.
+       </para>
+      </listitem>

SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.

======

doc/src/sgml/ref/pgupgrade.sgml

6.
+       <para>
+        Transport replication slots. Currently this can work only for logical
+        slots, and temporary slots are ignored. Note that pg_upgrade does not
+        check the installation of plugins.
+       </para>

SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...

======

src/bin/pg_dump/pg_dump.c

7. main
{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
{NULL, 0, NULL, 0}

The blank line is misplaced.

~~~

8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;

typo: /onlu/only/

~~~

9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+

9a.
typo: /replicatin/replication/

9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?

~~~

10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }

10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.

10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved above that (and also above the 'Dumping
LOs' etc...)

~~~

11. help

+ printf(_(" --slot-only dump only replication
slots, no schema and data\n"));

11a.
SUGGESTION
"no schema and data" --> "no schema or data"

11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.

~~~
12. getRepliactionSlots

+/*
+ * getRepliactionSlots
+ *   get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)

Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)

~~~

13. getRepliactionSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 && !dopt->slot_only)
+ return;

Hmmm, is that condition correct? Shouldn't the && be || here?

~~~

14. dumpReplicationSlot

+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.name);
+ query = createPQExpBuffer();
+
+ /*
+ * XXX: For simplification, pg_create_logical_replication_slot() is used.
+ * Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+   slotname);
+ appendStringLiteralAH(query, slotinfo->plugin, fout);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralAH(query, slotinfo->twophase, fout);
+ appendPQExpBuffer(query, ");");
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+ ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+ ARCHIVE_OPTS(.tag = slotname,
+   .description = "REPICATION SLOT",
+   .section = SECTION_POST_DATA,
+   .createStmt = query->data));
+
+ /* XXX: do we have to dump security label? */
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "REPICATION SLOT", slotname,
+ NULL, NULL,
+ slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+ pfree(slotname);
+ destroyPQExpBuffer(query);
+}

14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().

14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS description.

14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

======

src/bin/pg_dump/pg_dump.h

15. DumpableObjectType

@@ -82,7 +82,8 @@ typedef enum
  DO_PUBLICATION,
  DO_PUBLICATION_REL,
  DO_PUBLICATION_TABLE_IN_SCHEMA,
- DO_SUBSCRIPTION
+ DO_SUBSCRIPTION,
+ DO_REPICATION_SLOT
 } DumpableObjectType;

Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

======

src/bin/pg_upgrade/dump.c

16. generate_old_dump

+ /*
+ * Dump replicaiton slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring replication
+ * slots and other objects. Replication slots, in particular, should
+ * not be restored before executing the pg_resetwal command because it
+ * will remove WALs that are required by the slots.
+ */

Typo: /replicaiton/replication/

======

src/bin/pg_upgrade/pg_upgrade.c

17. main

+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+

I don't think that warrants a "XXX" style comment. It is just a "Note:".

~~~

18. create_replicaiton_slots
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)

Typo: /create_replicaiton_slots/create_replication_slots/

(Function name and comment)

~~~

19. create_replicaiton_slots

+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char slots_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
+ char    *opts;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+
+ snprintf(slots_file_name, sizeof(slots_file_name),
+ DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name),
+ DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+ parallel_exec_prog(log_file_name,
+    NULL,
+    "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+    new_cluster.bindir,
+    cluster_conn_opts(&new_cluster),
+    opts,
+    old_db->db_name,
+    log_opts.dumpdir,
+    slots_file_name);
+ }

That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?

Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

======

src/bin/pg_upgrade/pg_upgrade.h

20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS "pg_upgrade_dump_%u_slots.custom"

20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.

20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?

======

.../pg_upgrade/t/003_logical_replication.pl

21.
Some parts (formatting, comments, etc) in this file are inconsistent.

21a
");" is sometimes alone on a line, sometimes not

21b.
"Init" versus "Create" nodes.

21c.
# Check whether changes on new publisher are shipped to subscriber

SUGGESTION
Check whether changes on the new publisher get replicated to the subscriber
~

21d.
$result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
'check changes are shipped to subscriber');

For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Julien Rouhaud

rjuju123@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#1)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

On Tue, Apr 04, 2023 at 07:00:01AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear hackers,
(CC: Amit and Julien)

(thanks for the Cc)

This is a fork thread of Julien's thread, which allows to upgrade subscribers
without losing changes [1].

I briefly implemented a prototype for allowing to upgrade publisher node.
IIUC the key lack was that replication slots used for logical replication could
not be copied to new node by pg_upgrade command, so this patch allows that.
This feature can be used when '--include-replication-slot' is specified. Also,
I added a small test for the typical case. It may be helpful to understand.

Pg_upgrade internally executes pg_dump for dumping a database object from the old.
This feature follows this, adds a new option '--slot-only' to pg_dump command.
When specified, it extracts needed info from old node and generate an SQL file
that executes pg_create_logical_replication_slot().

The notable deference from pre-existing is that restoring slots are done at the
different time. Currently pg_upgrade works with following steps:

...
1. dump schema from old nodes
2. do pg_resetwal several times to new node
3. restore schema to new node
4. do pg_resetwal again to new node
...

The probem is that if we create replication slots at step 3, the restart_lsn and
confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
pg_resetwal discards the WAL file. Such slots cannot extracting changes.
To handle the issue the resotring is seprarated into two phases. At the first phase
restoring is done at step 3, excepts replicatin slots. At the second phase
replication slots are restored at step 5, after doing pg_resetwal.

Before upgrading a publisher node, all the changes gerenated on publisher must
be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
of copied replication slots is same as current_wal_insert_lsn. New node resets
the information which WALs are really applied on subscriber and restart.
Basically it is not problematic because before shutting donw the publisher, its
walsender processes confirm all data is replicated. See WalSndDone() and related code.

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably? If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.

Currently physical slots are ignored because this is out-of-scope for me.
I did not any analysis about it.

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Julien Rouhaud (#3)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Thank you for giving comments!

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably?

I think the requirement is needed because the existing WALs on old node cannot be
transported on new instance. The WAL hole from confirmed_flush to current position
could not be filled by newer instance.

If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.

Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Seems right. Will be fixed in next version. Maybe "--logical-replication-slots-only"
will be used, per Peter's suggestion [1]/messages/by-id/CAHut+PvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw@mail.gmail.com.

[1]: /messages/by-id/CAHut+PvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#4)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Seems right. Will be fixed in next version. Maybe
"--logical-replication-slots-only"
will be used, per Peter's suggestion [1].

After considering more, I decided not to include the word "logical" in the option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Peter Smith (#2)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing briefly. PSA new version.
If you can I want to ask the opinion about the checking by pg_upgrade [1]/messages/by-id/20230407024823.3j2s4doslsjemvis@jrouhaud.

======
General

1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.

pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"

help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.

OK, I renamed. By the way, how do you think the suggestion raised by Julien?
Currently I did not address it because the restriction was caused by just lack of
analysis, and this may be not agreed in the community.
Or, should we keep the name anyway?

2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.

Hmm, I supposed that the option shoul be used only for upgrading, so I'm not sure
it must be tested by only pg_dump.

Commit message

3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

~

"new option" --> "new pg_upgrade" option

Fixed.

4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

~

4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"

Fixed.

4b.
Typo: /replicaiton/replication/

Fixed.

4c
"leads an ERROR" --> "leads to an ERROR"

Fixed.

doc/src/sgml/ref/pg_dump.sgml
5.
+     <varlistentry>
+      <term><option>--slot-only</option></term>
+      <listitem>
+       <para>
+        Dump only replication slots, neither the schema (data definitions) nor
+        data. Mainly this is used for upgrading nodes.
+       </para>
+      </listitem>
SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.

Fixed.

doc/src/sgml/ref/pgupgrade.sgml
6.
+       <para>
+        Transport replication slots. Currently this can work only for logical
+        slots, and temporary slots are ignored. Note that pg_upgrade does not
+        check the installation of plugins.
+       </para>
SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...

Fixed.

src/bin/pg_dump/pg_dump.c

7. main
{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
{NULL, 0, NULL, 0}

The blank line is misplaced.

Fixed.

8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;

typo: /onlu/only/

Fixed.

9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+

9a.
typo: /replicatin/replication/

Fixed. Additionally, wrong parameter reference was also fixed.

9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?

I think there are something what should be checked more. But I'm not sure about
"no-publication". There is a possibility that non-core logical replication is used,
and at that time these options are not contradicted.

10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }
10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.

Fixed.

10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved above that (and also above the 'Dumping
LOs' etc...)

I read again, and I found that collected username are used to check the owner of
objects. IIUC replicaiton slots are not owned by database users, so it is not
needed. Also, the LOs should not dumped here. Based on them, I moved getRepliactionSlots()
above them.

11. help

+ printf(_(" --slot-only dump only replication
slots, no schema and data\n"));

11a.
SUGGESTION
"no schema and data" --> "no schema or data"

Fixed.

11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.

~~~
12. getRepliactionSlots
+/*
+ * getRepliactionSlots
+ *   get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)
Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)

Fixed.

13. getRepliactionSlots
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 && !dopt->slot_only)
+ return;
Hmmm, is that condition correct? Shouldn't the && be || here?

Right, fixed.

14. dumpReplicationSlot

+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.name);
+ query = createPQExpBuffer();
+
+ /*
+ * XXX: For simplification, pg_create_logical_replication_slot() is used.
+ * Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+   slotname);
+ appendStringLiteralAH(query, slotinfo->plugin, fout);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralAH(query, slotinfo->twophase, fout);
+ appendPQExpBuffer(query, ");");
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+ ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+ ARCHIVE_OPTS(.tag = slotname,
+   .description = "REPICATION SLOT",
+   .section = SECTION_POST_DATA,
+   .createStmt = query->data));
+
+ /* XXX: do we have to dump security label? */
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "REPICATION SLOT", slotname,
+ NULL, NULL,
+ slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+ pfree(slotname);
+ destroyPQExpBuffer(query);
+}

14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().

The style was chosen because previously I referred dumpSubscription(). But I read
PG manual and understood that COMMENT and SECURITY LABEL cannot be set to replication
slots. Therefore, I removed comments and dump for DUMP_COMPONENT_COMMENT, then
followed the style.

14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS
description.

~

14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

Both of them were fixed.

src/bin/pg_dump/pg_dump.h

15. DumpableObjectType
@@ -82,7 +82,8 @@ typedef enum
DO_PUBLICATION,
DO_PUBLICATION_REL,
DO_PUBLICATION_TABLE_IN_SCHEMA,
- DO_SUBSCRIPTION
+ DO_SUBSCRIPTION,
+ DO_REPICATION_SLOT
} DumpableObjectType;
Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

Fixed.

src/bin/pg_upgrade/dump.c

16. generate_old_dump

+ /*
+ * Dump replicaiton slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring replication
+ * slots and other objects. Replication slots, in particular, should
+ * not be restored before executing the pg_resetwal command because it
+ * will remove WALs that are required by the slots.
+ */

Typo: /replicaiton/replication/

Fixed.

src/bin/pg_upgrade/pg_upgrade.c

17. main

+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+

I don't think that warrants a "XXX" style comment. It is just a "Note:".

Fixed. Could you please tell me the classification of them if you can?

18. create_replicaiton_slots
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)
Typo: /create_replicaiton_slots/create_replication_slots/

(Function name and comment)

All of them were replaced.

19. create_replicaiton_slots

+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char slots_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
+ char    *opts;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+
+ snprintf(slots_file_name, sizeof(slots_file_name),
+ DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name),
+ DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+ parallel_exec_prog(log_file_name,
+    NULL,
+    "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+    new_cluster.bindir,
+    cluster_conn_opts(&new_cluster),
+    opts,
+    old_db->db_name,
+    log_opts.dumpdir,
+    slots_file_name);
+ }

That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?

Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

I had tried to follow the prepare_new_globals() style, but
I preferred your suggestion. Fixed.

src/bin/pg_upgrade/pg_upgrade.h

20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS
"pg_upgrade_dump_%u_slots.custom"

20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.

Fixed.

20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?

Right, fixed.

.../pg_upgrade/t/003_logical_replication.pl

21.
Some parts (formatting, comments, etc) in this file are inconsistent.

21a
");" is sometimes alone on a line, sometimes not

I ran pgperltidy and lonely ");" is removed.

21b.
"Init" versus "Create" nodes.

"Initialize" was chosen.

21c.
# Check whether changes on new publisher are shipped to subscriber

SUGGESTION
Check whether changes on the new publisher get replicated to the subscriber

Fixed.

21d.
$result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
'check changes are shipped to subscriber');

For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.

Added.

[1]: /messages/by-id/20230407024823.3j2s4doslsjemvis@jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0001-pg_upgrade-Add-include-replication-slots-option.patchapplication/octet-stream; name=v2-0001-pg_upgrade-Add-include-replication-slots-option.patchDownload

From 3ac0cdd9fef5bbaa3cdd152e3128a8e4747208a1 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v2] pg_upgrade: Add --include-replication-slots option

This commit introduces a new pg_upgrade option called "--include-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--replication-slots-only".
This option can be used to dump replication slots. When this option is specified,
the slot_name, plugin, and two_phase parameters are extracted from pg_replication_slots.
An SQL file is then generated which executes pg_create_logical_replication_slot()
with the extracted parameters.

For pg_upgrade, when '--include-replication-slots' is specified, it executes pg_dump
with the new "--replication-slots-only" option and restores from the dump. Apart
from restoring schema, pg_resetwal must not be called after restoring replication
slots. This is because the command discards WAL files and starts from a new segment,
even if they are required by replication slots. This leads to an ERROR: "requested
WAL segment XXX has already been removed". To avoid this, replication slots are
restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 141 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  15 +-
 src/bin/pg_dump/pg_dump_sort.c                |   4 +
 src/bin/pg_upgrade/dump.c                     |  23 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   6 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 .../pg_upgrade/t/003_logical_replication.pl   |  89 +++++++++++
 12 files changed, 363 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..2cd4fd10b0 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only replication slots; not the schema (data definitions), nor
+        data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..6505b0fd34 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade replication slots. Only logical replication slots are currently
+        supported, and temporary slots are ignored. Note that pg_upgrade does
+        not check the installation of plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..8a6f25cf2c 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			slot_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7a504dfe25..78c7102d3e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getReplicationSlots(Archive *fout);
+static void dumpReplicationSlot(Archive *fout,
+								const ReplicationSlotInfo * slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,11 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.slot_only = true;
+				dopt.include_everything = false;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +723,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.slot_only && dopt.dataOnly)
+		pg_fatal("options --replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.slot_only && dopt.schemaOnly)
+		pg_fatal("options --replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +890,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump replication-slots-only was requested, dump only them and skip
+	 * everything else.
+	 */
+	if (dopt.slot_only)
+	{
+		getReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +960,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1119,6 +1145,7 @@ help(const char *progname)
 	printf(_("  --no-unlogged-table-data     do not dump unlogged table data\n"));
 	printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING to INSERT commands\n"));
 	printf(_("  --quote-all-identifiers      quote all identifiers, even if not key words\n"));
+	printf(_("  --replication-slots-only     dump only replication slots, no schema or data\n"));
 	printf(_("  --rows-per-insert=NROWS      number of rows per INSERT; implies --inserts\n"));
 	printf(_("  --section=SECTION            dump named section (pre-data, data, or post-data)\n"));
 	printf(_("  --serializable-deferrable    wait until the dump can run without anomalies\n"));
@@ -10252,6 +10279,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_REPLICATION_SLOT:
+			dumpReplicationSlot(fout, (const ReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18257,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18519,113 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	ReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->slot_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT r.slot_name, r.plugin, r.two_phase "
+						 "FROM pg_replication_slots r "
+						 "WHERE r.database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(ReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/* FIXME: force dumping */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpReplicationSlot
+ *	  write down a script for pg_restore command
+ */
+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo * slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->slot_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..e59cfdd8fa 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,18 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The ReplicationSlotInfo struct is used to represent replication slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _ReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+}			ReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..4280283f0d 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -1497,6 +1497,10 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 			snprintf(buf, bufsize,
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
+		case DO_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
 			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..f8d0c6ddde 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,29 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name), DB_DUMP_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..7f5d48b7e1 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..d8d9f69b47 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,7 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-replication-slots   upgrade replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..04bf1d867a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..1aa41f68bc 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,7 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_slots;	/* true -> dump and restore replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication.pl b/src/bin/pg_upgrade/t/003_logical_replication.pl
new file mode 100644
index 0000000000..13ddde3d5f
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication.pl
@@ -0,0 +1,89 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading publisher
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-replication-slot'
+	],
+	'run of pg_upgrade for new publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
-- 
2.27.0

Julien Rouhaud

rjuju123@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#4)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Apr 07, 2023 at 09:40:14AM +0000, Hayato Kuroda (Fujitsu) wrote:

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably?

I think the requirement is needed because the existing WALs on old node cannot be
transported on new instance. The WAL hole from confirmed_flush to current position
could not be filled by newer instance.

I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.

If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.

Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that? That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was. For the double
shutdown checkpoint, I'm not sure that I get the problem. The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?

Julien Rouhaud

rjuju123@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#5)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Apr 07, 2023 at 12:51:51PM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Julien,

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Seems right. Will be fixed in next version. Maybe
"--logical-replication-slots-only"
will be used, per Peter's suggestion [1].

After considering more, I decided not to include the word "logical" in the option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.

Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.

That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Julien Rouhaud (#8)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.

You meant to say that we can rename options like "logical-*" and later add a new
option for physical slots if needed, right? PSA the new patch which handled the comment.

That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

Your point made sense to me. Thank you for giving your opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 3f06eb0aa093ac70806d0c3fd6c8713d4e4c0b44 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v3] pg_upgrade: Add --include-logical-replication-slots option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 144 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  16 +-
 src/bin/pg_dump/pg_dump_sort.c                |   4 +
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../pg_upgrade/t/003_logical_replication.pl   |  89 +++++++++++
 12 files changed, 370 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..7bcaa388b1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..5c7832846f 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slot_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 967ced4eed..77d87009e0 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo * slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,11 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slot_only = true;
+				dopt.include_everything = false;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +723,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slot_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slot_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +890,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slot_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +960,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1119,6 +1145,8 @@ help(const char *progname)
 	printf(_("  --no-unlogged-table-data     do not dump unlogged table data\n"));
 	printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING to INSERT commands\n"));
 	printf(_("  --quote-all-identifiers      quote all identifiers, even if not key words\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --rows-per-insert=NROWS      number of rows per INSERT; implies --inserts\n"));
 	printf(_("  --section=SECTION            dump named section (pre-data, data, or post-data)\n"));
 	printf(_("  --serializable-deferrable    wait until the dump can run without anomalies\n"));
@@ -10381,6 +10409,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18382,6 +18414,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18643,3 +18676,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slot_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT r.slot_name, r.plugin, r.two_phase "
+						 "FROM pg_replication_slots r "
+						 "WHERE r.database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/* FIXME: force dumping */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo * slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slot_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 765fe6399a..e587d4cca4 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,19 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+}			LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..0f3c89b9fb 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -1497,6 +1497,10 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 			snprintf(buf, bufsize,
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
 			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..7f5d48b7e1 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication.pl b/src/bin/pg_upgrade/t/003_logical_replication.pl
new file mode 100644
index 0000000000..8b3be8b0d4
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication.pl
@@ -0,0 +1,89 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading publisher
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slot'
+	],
+	'run of pg_upgrade for new publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
-- 
2.27.0

#10

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Julien Rouhaud (#7)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Thank you for giving idea! I have analyzed about it.

If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to

make

sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new

row

added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.

Hmm, good point. Current patch could not be handled the case because

walsenders

for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was

behind

from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that? That's hackish,
but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was. For the double
shutdown checkpoint, I'm not sure that I get the problem. The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?

I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.

* HEAP/HEAP2 records. These records may be inserted by checkpointer.

IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records beforehand.

* INVALIDATION(S) records. These records may be inserted by VACUUM.

There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.

* RUNNING_XACTS record

It might be a timing issue, but I found that sometimes background writer generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be appeared or not.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#11

Peter Smith

smithpb2250@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#9)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are a few more review comments for patch v3-0001.

======
doc/src/sgml/ref/pgupgrade.sgml

1.
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>

Missing word.

"Only permanent replication slots included." --> "Only permanent
replication slots are included."

======
src/bin/pg_dump/pg_dump.c

2. help

@@ -1119,6 +1145,8 @@ help(const char *progname)
  printf(_("  --no-unlogged-table-data     do not dump unlogged table data\n"));
  printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING
to INSERT commands\n"));
  printf(_("  --quote-all-identifiers      quote all identifiers, even
if not key words\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
  printf(_("  --rows-per-insert=NROWS      number of rows per INSERT;
implies --inserts\n"));
A previous review comment ([1] #11b) seems to have been missed. This
help is misplaced. It should be in alphabetical order consistent with
all the other help.

======
src/bin/pg_dump/pg_dump.h

3. _LogicalReplicationSlotInfo

+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+ DumpableObject dobj;
+ char    *plugin;
+ char    *slottype;
+ char    *twophase;
+} LogicalReplicationSlotInfo;
+

4a.
The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
unlike others in this file. Is it OK?

4b.
There was no typedefs.list file in this patch. Maybe the above
whitespace problem is a result of that omission.

======
.../pg_upgrade/t/003_logical_replication.pl

+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,        '--include-logical-replication-slot'
+ ],
+ 'run of pg_upgrade for new publisher');

5a.
How can this test even be working as-expected with those options?

Here it is passing option '--include-logical-replication-slot' but
AFAIK the proper option name everywhere else in this patch is
'--include-logical-replication-slots' (with the 's')

5b.
I'm not sure that "pg_upgrade for new publisher" makes sense.

It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
publisher"

------
[1]: /messages/by-id/TYCPR01MB5870E212F5012FD6272CE1E3F5969@TYCPR01MB5870.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#12

Peter Smith

smithpb2250@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#6)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Apr 8, 2023 at 12:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

...

17. main
+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+
I don't think that warrants a "XXX" style comment. It is just a "Note:".
Fixed. Could you please tell me the classification of them if you can?

Hopefully, someone will correct me if this explanation is wrong, but
my understanding of the different prefixes is like this --

"XXX" is used as a marker for future developers to consider maybe
revisiting/improving something that the comment refers to
e.g.
/* XXX - it would be better to code this using blah but for now we did
not.... */
/* XXX - option 'foo' is not currently supported but... */
/* XXX - it might be worth considering adding more checks or an assert
here because... */

OTOH, "Note" is just for highlighting why something is the way it is,
but with no implication that it should be revisited/changed in the
future.
e.g.
/* Note: We deliberately do not test the state here because... */
/* Note: This memory must be zeroed because... */
/* Note: This string has no '\0' terminator so... */

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#13

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Peter Smith (#11)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for giving comments! PSA new version.

======
doc/src/sgml/ref/pgupgrade.sgml

1.
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>

Missing word.

"Only permanent replication slots included." --> "Only permanent
replication slots are included."

Fixed.

======
src/bin/pg_dump/pg_dump.c

2. help

@@ -1119,6 +1145,8 @@ help(const char *progname)
printf(_("  --no-unlogged-table-data     do not dump unlogged table
data\n"));
printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING
to INSERT commands\n"));
printf(_("  --quote-all-identifiers      quote all identifiers, even
if not key words\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
printf(_("  --rows-per-insert=NROWS      number of rows per INSERT;
implies --inserts\n"));
A previous review comment ([1] #11b) seems to have been missed. This
help is misplaced. It should be in alphabetical order consistent with
all the other help.

Sorry, fixed.

src/bin/pg_dump/pg_dump.h

3. _LogicalReplicationSlotInfo
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+ DumpableObject dobj;
+ char    *plugin;
+ char    *slottype;
+ char    *twophase;
+} LogicalReplicationSlotInfo;
+
4a.
The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
unlike others in this file. Is it OK?

I was betrayed by pgindent because of the reason you pointed out.
Fixed.

4b.
There was no typedefs.list file in this patch. Maybe the above
whitespace problem is a result of that omission.

Your analysis is correct. Added.

.../pg_upgrade/t/003_logical_replication.pl

5.
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,        '--include-logical-replication-slot'
+ ],
+ 'run of pg_upgrade for new publisher');
5a.
How can this test even be working as-expected with those options?

Here it is passing option '--include-logical-replication-slot' but
AFAIK the proper option name everywhere else in this patch is
'--include-logical-replication-slots' (with the 's')

This is because getopt_long implemented by GNU can accept incomplete options if
collect one can be predicted from input. E.g. pg_upgrade on linux can accept
`--ve` as `--verbose`, whereas the binary built on Windows cannot.

Anyway, the difference was not my expectation. Fixed.

5b.
I'm not sure that "pg_upgrade for new publisher" makes sense.

It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
publisher"

Fixed.

Additionally, I fixed two bugs which are detected by AddressSanitizer.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v4-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v4-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 2cc29a84cb4dae1a7db8742f8b165dca42e39ac7 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v4] pg_upgrade: Add --include-logical-replication-slots option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 144 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  16 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../pg_upgrade/t/003_logical_replication.pl   |  89 +++++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 13 files changed, 376 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..5c7832846f 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slot_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 967ced4eed..05ca3f8677 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,11 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slot_only = true;
+				dopt.include_everything = false;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +723,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slot_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slot_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +890,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slot_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +960,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1135,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10381,6 +10409,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18382,6 +18414,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18643,3 +18676,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slot_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT r.slot_name, r.plugin, r.two_phase "
+						 "FROM pg_replication_slots r "
+						 "WHERE r.database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/* FIXME: force dumping */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slot_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 765fe6399a..f5535816d2 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,19 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..b36ced8c8e 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..7f5d48b7e1 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication.pl b/src/bin/pg_upgrade/t/003_logical_replication.pl
new file mode 100644
index 0000000000..4067535fa4
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication.pl
@@ -0,0 +1,89 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading publisher
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..7e999726c2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
-- 
2.27.0

#14

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Peter Smith (#12)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for giving explanation.

Hopefully, someone will correct me if this explanation is wrong, but
my understanding of the different prefixes is like this --

"XXX" is used as a marker for future developers to consider maybe
revisiting/improving something that the comment refers to
e.g.
/* XXX - it would be better to code this using blah but for now we did
not.... */
/* XXX - option 'foo' is not currently supported but... */
/* XXX - it might be worth considering adding more checks or an assert
here because... */

OTOH, "Note" is just for highlighting why something is the way it is,
but with no implication that it should be revisited/changed in the
future.
e.g.
/* Note: We deliberately do not test the state here because... */
/* Note: This memory must be zeroed because... */
/* Note: This string has no '\0' terminator so... */

I confirmed that current "XXX" comments must be removed by improving
or some decision. Therefore I kept current annotation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#15

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#1)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

My PoC does not read and copy logical mappings files to new node, but I
did not analyzed in detail whether it is correct. Now I have done this and
considered that they do not have to be copied because transactions which executed
at the same time as rewriting are no longer decoded. How do you think?
Followings my analysis.

## What is logical mappings files?

Logical mappings file is used to track the system catalogs while logical decoding
even if its heap file is written. Sometimes catalog heaps files are modified, or
completely replaced to new files via VACUUM FULL or CLUSTER, but reorder buffer
cannot not track new one as-is. Mappings files allow to do them.

The file contains key-value relations for old-to-new tuples. Also, the name of
files contains the LSN where the triggered event is happen.

Mappings files are needed when transactions which modify catalogs are decoded.
If the LSN of files are older than restart_lsn, they are no longer needed then
removed. Please see CheckPointLogicalRewriteHeap().

## Is it needed?

I think this is not needed.
Currently pg_upgrade dumps important information from old publisher and then
execute pg_create_logical_replication_slot() on new one. Apart from
pg_copy_logical_replication_slot(), retart_lsn and confirmed_flush_lsn for old
slot is not taken over to the new slot. They are recalculated on new node while
creating. This means that transactions which have modified catalog heaps on the
old publisher are no longer decoded on new publisher.

Therefore, the mappings files on old publisher are not needed for new one.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#16

Peter Smith

smithpb2250@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#13)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

FYI, here are some minor review comments for v4-0001

======
src/bin/pg_dump/pg_backup.h

1.
+ int logical_slot_only;

The field should be plural - "logical_slots_only"

======
src/bin/pg_dump/pg_dump.c

2.
+ appendPQExpBufferStr(query,
+ "SELECT r.slot_name, r.plugin, r.two_phase "
+ "FROM pg_replication_slots r "
+ "WHERE r.database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

The alias 'r' may not be needed at all here, but since you already
have it IMO it looks a bit strange that you used it for only some of
the columns but not others.

~~~

3.
+
+ /* FIXME: force dumping */
+ slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;

Why the "FIXME" here? Are you intending to replace this code with
something else?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#17

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

almost 3 years ago

In reply to: Peter Smith (#16)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for giving comments. PSA new version.

src/bin/pg_dump/pg_backup.h

1.
+ int logical_slot_only;

The field should be plural - "logical_slots_only"

Fixed.

src/bin/pg_dump/pg_dump.c
2.
+ appendPQExpBufferStr(query,
+ "SELECT r.slot_name, r.plugin, r.two_phase "
+ "FROM pg_replication_slots r "
+ "WHERE r.database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
The alias 'r' may not be needed at all here, but since you already
have it IMO it looks a bit strange that you used it for only some of
the columns but not others.

Right, I removed alias. Moreover, the namespace 'pg_catalog' is now specified.

3.
+
+ /* FIXME: force dumping */
+ slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
Why the "FIXME" here? Are you intending to replace this code with
something else?

I was added FIXME because I was not sure whether we must add selectDumpable...()
function was needed or not. Now I have been thinking that such a functions are not
needed, so replaced comments. More detail, please see following:

Replication slots cannot be a member of extension because pg_create_logical_replication_slot()
cannot be called within the install script. This means that checkExtensionMembership()
is not needed. Moreover, we do not have have any options to include/exclude slots
in dumping, so checking their name like selectDumpableExtension() is not needed.
Based on them, I think the function is not needed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v5-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v5-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 7dc3d0a297e111ade378e47a43135652d7949715 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v5] pg_upgrade: Add --include-logical-replication-slots option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 146 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  16 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../pg_upgrade/t/003_logical_replication.pl   |  89 +++++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 13 files changed, 378 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 967ced4eed..49526b0486 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +889,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +959,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1134,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10381,6 +10408,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18382,6 +18413,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18643,3 +18675,117 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 765fe6399a..f5535816d2 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,19 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..b36ced8c8e 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..7f5d48b7e1 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication.pl b/src/bin/pg_upgrade/t/003_logical_replication.pl
new file mode 100644
index 0000000000..4067535fa4
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication.pl
@@ -0,0 +1,89 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading publisher
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..7e999726c2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
-- 
2.27.0

#18

Peter Smith

smithpb2250@gmail.com

almost 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#17)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

I do not have any more review comments for the v5 patch, but here are
a few remaining nitpick items.

======
General

1.
There were a couple of comments that I thought would appear less
squished (aka more readable) if there was a blank line preceding the
XXX.

1a. This one is in getLogicalReplicationSlots

+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ * XXX: Do we have to support physical slots?
+ */

1b. This one is for the LogicalReplicationSlotInfo typedef

+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */

BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/

======
src/bin/pg_dump/pg_dump_sort.c

2. describeDumpableObject

+ case DO_LOGICAL_REPLICATION_SLOT:
+ snprintf(buf, bufsize,
+ "REPLICATION SLOT (ID %d NAME %s)",
+ obj->dumpId, obj->name);
+ return;

Since everything else was changed to say logical replication slot,
should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
%s)"?

======
.../pg_upgrade/t/003_logical_replication.pl

3.
Should the name of this TAP test file really be 03_logical_replication_slots.pl?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#19

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#18)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for checking. Then we can wait comments from others.
PSA modified version.

1.
There were a couple of comments that I thought would appear less
squished (aka more readable) if there was a blank line preceding the
XXX.

1a. This one is in getLogicalReplicationSlots
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ * XXX: Do we have to support physical slots?
+ */

Added.

1b. This one is for the LogicalReplicationSlotInfo typedef
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */

Added.

BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/

Good finding, replaced.

src/bin/pg_dump/pg_dump_sort.c

2. describeDumpableObject
+ case DO_LOGICAL_REPLICATION_SLOT:
+ snprintf(buf, bufsize,
+ "REPLICATION SLOT (ID %d NAME %s)",
+ obj->dumpId, obj->name);
+ return;
Since everything else was changed to say logical replication slot,
should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
%s)"?

I missed to replace, changed.

.../pg_upgrade/t/003_logical_replication.pl

3.
Should the name of this TAP test file really be 03_logical_replication_slots.pl?

Hmm, not sure. Currently I renamed once according to your advice, but personally
another feature which allows to upgrade subscriber[1]/messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud should be tested in the same
perl file. That's why I named as "003_logical_replication.pl"

[1]: /messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v6-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v6-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 498c48d2573118617fda33fcd7433d874f8cad48 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v6] pg_upgrade: Add --include-logical-replication-slots option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 147 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        |  90 +++++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 13 files changed, 381 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 73a6c964ba..6cff4009b3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +889,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +959,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1134,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10279,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18258,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18520,118 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..8028ccf6ff 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..7c37873743 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..81917f3b14
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,90 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading replication slots
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..7e999726c2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
-- 
2.27.0

#20

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#10)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

On Mon, Apr 10, 2023 at 09:18:46AM +0000, Hayato Kuroda (Fujitsu) wrote:

I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.

* HEAP/HEAP2 records. These records may be inserted by checkpointer.

IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records beforehand.

* INVALIDATION(S) records. These records may be inserted by VACUUM.

There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.

* RUNNING_XACTS record

It might be a timing issue, but I found that sometimes background writer generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be appeared or not.

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase. One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot). With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.

Attachments:

v1-0001-Always-persist-to-disk-logical-slots-during-a-shu.patch.txttext/plain; charset=us-asciiDownload

From 77c3d2d361893de857627e036d0eaaf01cfe91c1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v1] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b540ee293b..8100ca656e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..2bbf2af770 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+							bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.37.0

#21

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Julien Rouhaud (#20)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase. One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot). With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 512b817c55d8d2dec7f70684250e5c67f1562711 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v7 1/3] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 147 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        |  90 +++++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 13 files changed, 381 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 73a6c964ba..6cff4009b3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +889,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +959,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1134,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10279,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18258,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18520,118 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..8028ccf6ff 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 8266c117a3..7c37873743 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..81917f3b14
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,90 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading replication slots
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots");
+is($result, qq(1), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..7e999726c2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
-- 
2.27.0

v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchapplication/octet-stream; name=v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchDownload

From baf1b71bd2c51b130fba324ce7c8d1dddd7a1a5a Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v7 2/3] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b540ee293b..8100ca656e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v7-0003-pg_upgrade-Add-check-function-for-include-logical.patchapplication/octet-stream; name=v7-0003-pg_upgrade-Add-check-function-for-include-logical.patchDownload

From ddd3739eff35141bf00e64a9c5e5500f3c97383b Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v7 3/3] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 76 +++++++++++++++++++
 .../t/003_logical_replication_slots.pl        | 58 +++++++++++---
 2 files changed, 123 insertions(+), 11 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..8dda227bf2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -30,6 +33,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *cluster);
 
 
 /*
@@ -103,6 +107,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_logical_replication_slots(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1402,3 +1408,73 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogShortPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet.",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slot requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 81917f3b14..527d448a00 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -4,6 +4,9 @@
 
 use strict;
 use warnings;
+
+use File::Path qw(rmtree);
+
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
@@ -11,7 +14,7 @@ use Test::More;
 # Can be changed to test the other modes.
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
-# Initialize publisher node
+# Initialize old publisher node
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
 $old_publisher->start;
@@ -21,10 +24,48 @@ my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 $subscriber->init(allows_streaming => 'logical');
 $subscriber->start;
 
+# Initialize new publisher node
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+my $bindir = $new_publisher->config_data('--bindir');
+
+# Schema setup
 $old_publisher->safe_psql('postgres',
 	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
 $subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
 
+# Create a dummy slot on old publisher to fail the test
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('dropme_slot', 'pgoutput')");
+$old_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because dropme_slot is not
+# advanced till the INSERT statement
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check',
+	],
+	'run of pg_upgrade --check for new instance with inactive slot');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('dropme_slot')");
+
 # Setup logical replication
 my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
@@ -36,18 +77,13 @@ $subscriber->safe_psql('postgres',
 $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 
 my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(10), 'check initial rows on subscriber');
+is($result, qq(20), 'check initial rows on subscriber');
 
 # Preparations for upgrading publisher
 $old_publisher->stop;
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 
-my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
-$new_publisher->init(allows_streaming => 'logical');
-
-my $bindir = $new_publisher->config_data('--bindir');
-
-# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+# Run pg_upgrade again. pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
@@ -64,7 +100,7 @@ command_ok(
 ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-# Check whether the replication slot is copied
+# Check whether the replication slot is copied to new publisher
 $new_publisher->start;
 $result =
   $new_publisher->safe_psql('postgres',
@@ -80,11 +116,11 @@ $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
 
 # Check whether changes on the new publisher get replicated to the subscriber
 $new_publisher->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	"INSERT INTO tbl VALUES (generate_series(21, 30))");
 
 $new_publisher->wait_for_catchup('sub');
 
 $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(20), 'check changes are shipped to subscriber');
+is($result, qq(30), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#22

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#21)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, 14 Apr 2023 at 16:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Julien,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase. One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot). With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Thanks for the patches.
Currently the two_phase enabled slots are not getting restored from
the dumped contents, this is because we are passing the twophase value
as the second parameter which indicates if it is temporary or not to
the pg_create_logical_replication_slot function as in [1], while
restoring it is internally creating the slot as a temporary slot in
this case:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");
+
+               ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+                                        ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+
.section = SECTION_POST_DATA,
+
.createStmt = query->data));
+
+               pfree(slotname);
+               destroyPQExpBuffer(query);
+       }
+}

Since we are dumping only the permanent slots, we could update
temporary parameter as false:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", f, ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");

[1]: https://www.postgresql.org/docs/devel/functions-admin.html

Regards,
Vignesh

#23

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#21)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, 14 Apr 2023 at 16:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Julien,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase. One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot). With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Logical replication slots can be created only if wal_level >= logical,
currently we do not have any check to see if wal_level >= logical if
"--include-logical-replication-slots" option is specified. If
include-logical-replication-slots is specified with pg_upgrade, we
will be creating replication slots after a lot of steps like
performing prechecks, analyzing, freezing, deleting, restoring,
copying, setting related objects and then create replication slot,
where we will be erroring out after a lot of time(Many cases
pg_upgrade takes a lot of hours to perform these operations). I feel
it would be better to add a check in the beginning itself somewhere in
check_new_cluster to see if wal_level is set appropriately in case of
include-logical-replication-slot option to detect and throw an error
early itself.

Regards,
Vignesh

#24

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#22)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thanks for giving a comment! New patch will be available soon.

Thanks for the patches.
Currently the two_phase enabled slots are not getting restored from
the dumped contents, this is because we are passing the twophase value
as the second parameter which indicates if it is temporary or not to
the pg_create_logical_replication_slot function as in [1], while
restoring it is internally creating the slot as a temporary slot in
this case:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");
+
+               ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+                                        ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+
.section = SECTION_POST_DATA,
+
.createStmt = query->data));
+
+               pfree(slotname);
+               destroyPQExpBuffer(query);
+       }
+}

Since we are dumping only the permanent slots, we could update
temporary parameter as false:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", f, ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");

[1] - https://www.postgresql.org/docs/devel/functions-admin.html

Yeah, you are right. I misread the interface of the function.
Fixed and added new test.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#25

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#23)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for reviewing! PSA new patchset.

Additionally, I added a checking functions in 0003
According to pg_resetwal and other functions, the length of

CHECKPOINT_SHUTDOWN

record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +

sizeof(CheckPoint)).

Therefore, the function ensures that the difference between current insert

position

and confirmed_flush_lsn is less than (above + page header).

Logical replication slots can be created only if wal_level >= logical,
currently we do not have any check to see if wal_level >= logical if
"--include-logical-replication-slots" option is specified. If
include-logical-replication-slots is specified with pg_upgrade, we
will be creating replication slots after a lot of steps like
performing prechecks, analyzing, freezing, deleting, restoring,
copying, setting related objects and then create replication slot,
where we will be erroring out after a lot of time(Many cases
pg_upgrade takes a lot of hours to perform these operations). I feel
it would be better to add a check in the beginning itself somewhere in
check_new_cluster to see if wal_level is set appropriately in case of
include-logical-replication-slot option to detect and throw an error
early itself.

I see your point. Moreover, I think max_replication_slots != 0 must be also checked.
I added a checking function and related test in 0001.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v8-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v8-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From 94a5cffd6475ecd97f7a4699f993ecc61e0f5ead Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v8 1/3] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 147 +++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/check.c                    |  39 +++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        | 150 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 14 files changed, 480 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 058244cd17..d824150d79 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,11 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +889,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +959,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1134,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10279,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18258,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18520,118 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i, i_twophase));
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot('%s', ",
+						  slotname);
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, ");
+		appendStringLiteralAH(query, slotinfo->twophase, fout);
+		appendPQExpBuffer(query, ");");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		pfree(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..8028ccf6ff 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	char	   *twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..46a7d64448 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -210,6 +211,9 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir(&new_cluster);
+
+	if (user_opts.include_logical_slots)
+		check_for_parameter_settings(&new_cluster);
 }
 
 
@@ -1402,3 +1406,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+				  wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..1241060f4e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info now that we have objects in the databases */
+	get_db_and_rel_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..5f3d7a407e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -304,6 +305,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..1159487f94
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Schema setup
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Initialize new publisher node
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with wrong max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+$old_publisher->start;
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Define another replication slot which allows to decode prepared transactions
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_catalog.pg_create_logical_replication_slot('twophase_slot', 'pgoutput', false, true)");
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+# Actual run, pg_upgrade_output.d is removed at the end.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied to new publisher
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|f
+twophase_slot|t), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..7e999726c2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
-- 
2.27.0

v8-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchapplication/octet-stream; name=v8-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchDownload

From 8af435355351dc655c00e732a568b323466a1f87 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v8 2/3] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 63481d826f..07d775cf33 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v8-0003-pg_upgrade-Add-check-function-for-include-logical.patchapplication/octet-stream; name=v8-0003-pg_upgrade-Add-check-function-for-include-logical.patchDownload

From 18cc4b037143d576c025d3a37a8eccbcea66ce63 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v8 3/3] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 81 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 38 ++++++++-
 2 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 46a7d64448..b1e1f516c9 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -104,6 +107,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1418,6 +1423,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1441,3 +1450,73 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogShortPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet.",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slot requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 1159487f94..cb7acb3302 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,6 +85,38 @@ $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
 $old_publisher->start;
 
+# Create a dummy slot on old publisher to fail the test
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('dropme_slot', 'pgoutput')");
+$old_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because dropme_slot is not
+# advanced till the INSERT statement
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('dropme_slot')");
+
 # Setup logical replication
 my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
@@ -96,7 +128,7 @@ $subscriber->safe_psql('postgres',
 $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 
 my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(10), 'check initial rows on subscriber');
+is($result, qq(20), 'check initial rows on subscriber');
 
 # Define another replication slot which allows to decode prepared transactions
 $old_publisher->safe_psql('postgres',
@@ -140,11 +172,11 @@ $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
 
 # Check whether changes on the new publisher get replicated to the subscriber
 $new_publisher->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	"INSERT INTO tbl VALUES (generate_series(21, 30))");
 
 $new_publisher->wait_for_catchup('sub');
 
 $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(20), 'check changes are shipped to subscriber');
+is($result, qq(30), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#26

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#25)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

On Thu, Apr 20, 2023 at 05:31:16AM +0000, Hayato Kuroda (Fujitsu) wrote:

Dear Vignesh,

Thank you for reviewing! PSA new patchset.

Additionally, I added a checking functions in 0003
According to pg_resetwal and other functions, the length of

CHECKPOINT_SHUTDOWN

record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +

sizeof(CheckPoint)).

Therefore, the function ensures that the difference between current insert

position

and confirmed_flush_lsn is less than (above + page header).

I think that this test should be different when just checking for the
prerequirements (live_check / --check) compared to actually doing the upgrade,
as it's almost guaranteed that the slots won't have sent everything when the
source server is up and running.

Maybe simply check that all logical slots are currently active when running the
live check, so at least there's a good chance that they will still be at
shutdown, and will therefore send all the data to the subscribers? Having a
regression tests for that scenario would also be a good idea. Having an
uncommitted write transaction should be enough to cover it.

#27

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#25)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, 20 Apr 2023 at 11:01, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Vignesh,

Thank you for reviewing! PSA new patchset.

Additionally, I added a checking functions in 0003
According to pg_resetwal and other functions, the length of

CHECKPOINT_SHUTDOWN

record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +

sizeof(CheckPoint)).

Therefore, the function ensures that the difference between current insert

position

and confirmed_flush_lsn is less than (above + page header).

Logical replication slots can be created only if wal_level >= logical,
currently we do not have any check to see if wal_level >= logical if
"--include-logical-replication-slots" option is specified. If
include-logical-replication-slots is specified with pg_upgrade, we
will be creating replication slots after a lot of steps like
performing prechecks, analyzing, freezing, deleting, restoring,
copying, setting related objects and then create replication slot,
where we will be erroring out after a lot of time(Many cases
pg_upgrade takes a lot of hours to perform these operations). I feel
it would be better to add a check in the beginning itself somewhere in
check_new_cluster to see if wal_level is set appropriately in case of
include-logical-replication-slot option to detect and throw an error
early itself.

I see your point. Moreover, I think max_replication_slots != 0 must be also checked.
I added a checking function and related test in 0001.

Thanks for the updated patch.
A Few comments:
1) if the verbose option is enabled, we should print the new slot
information, we could add a function print_slot_infos similar to
print_rel_infos which could print slot name and two_phase is enabled
or not.
+       end_progress_output();
+       check_ok();
+
+       /* update new_cluster info now that we have objects in the databases */
+       get_db_and_rel_infos(&new_cluster);
+}

2) Since we will be using this option with pg_upgrade, should we use
this along with the --binary-upgrade option only?
+       if (dopt.logical_slots_only && dopt.dataOnly)
+               pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+       if (dopt.logical_slots_only && dopt.schemaOnly)
+               pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");

3) Since it two_phase is boolean, can we use bool data type instead of string:
+               slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+               slotinfo[i].dobj.catId.tableoid = InvalidOid;
+               slotinfo[i].dobj.catId.oid = InvalidOid;
+               AssignDumpId(&slotinfo[i].dobj);
+
+               slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i,
i_slotname));
+
+               slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+               slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i,
i_twophase));

We can change it to something like:
if (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0)
slotinfo[i].twophase = true;
else
slotinfo[i].twophase = false;

4) The comments are inconsistent, some have termination characters and
some don't. We can keep it consistent:
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Schema setup
+$old_publisher->safe_psql('postgres',
+       "CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Initialize new publisher node

5) should we use free instead of pfree as used in other function like
dumpForeignServer:
+               appendPQExpBuffer(query, ");");
+
+               ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+                                        ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+
.section = SECTION_POST_DATA,
+
.createStmt = query->data));
+
+               pfree(slotname);
+               destroyPQExpBuffer(query);
+       }
+}

Regards,
Vignesh

#28

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Julien Rouhaud (#26)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Julien,

Thank you for giving comments! PSA new version.

I think that this test should be different when just checking for the
prerequirements (live_check / --check) compared to actually doing the upgrade,
as it's almost guaranteed that the slots won't have sent everything when the
source server is up and running.

Hmm, you assumed that the user application is still running and data is coming
continuously when doing --check, right? Personally I have thought that the
--check operation is executed just before the actual upgrading, therefore I'm not
sure your assumption is real problem. And I could not find any checks which their
contents are changed based on the --check option.

Anyway, I included your opinion in 0004 patch. We can ask some other reviewers
about the necessity.

Maybe simply check that all logical slots are currently active when running the
live check,

Yeah, if we support the case checking pg_replication_slots.active may be sufficient.
Actually this cannot handle the case that pg_create_logical_replication_slot()
is executed just before upgrading, but I'm not sure it should be.

so at least there's a good chance that they will still be at
shutdown, and will therefore send all the data to the subscribers? Having a
regression tests for that scenario would also be a good idea. Having an
uncommitted write transaction should be enough to cover it.

I think background_psql() can be used for the purpose. Before doing pg_upgrade
--check, a transaction is opened and kept. It means that the confirmed_flush has
been not reached to the current WAL position yet, but the checking says OK
because all slots are active.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v9-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v9-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload

From abc975cb9eae0088231b72fc6264024d0f5869fe Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v9 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 150 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/check.c                    |  57 +++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 116 +++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  18 +++
 .../t/003_logical_replication_slots.pl        | 150 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 633 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 058244cd17..ed98572cce 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,14 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && !dopt.binary_upgrade)
+		pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +892,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +962,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1137,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10282,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18261,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18523,118 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotname, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		free(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..9674c1a2ce 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -88,6 +89,7 @@ check_and_dump_old_cluster(bool live_check)
 
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
+	get_logical_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -188,6 +190,7 @@ void
 check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
+	get_logical_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +213,9 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir(&new_cluster);
+
+	if (user_opts.include_logical_slots)
+		check_for_parameter_settings(&new_cluster);
 }
 
 
@@ -364,6 +370,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existing of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1424,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..0996fdb0a8 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,12 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
+static void free_logical_slot_infos(LogicalSlotInfoArr *slot_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +602,93 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		if (!cluster->dbarr.dbs[dbnum].slot_arr.slots)
+			free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+
+		get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
+	}
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	if (log_opts.verbose)
+	{
+		for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+		{
+			pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
+			print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos;
+
+	int			ntups;
+	int			slotnum;
+	int			num_slots = 0;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	ntups = PQntuples(res);
+
+	slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	for (slotnum = 0; slotnum < ntups; slotnum++)
+	{
+		LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+		curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+		curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+		curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+	}
+
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -634,6 +723,19 @@ free_rel_infos(RelInfoArr *rel_arr)
 	rel_arr->nrels = 0;
 }
 
+static void
+free_logical_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		pg_free(slot_arr->slots[slotnum].slotname);
+		pg_free(slot_arr->slots[slotnum].plugin);
+	}
+	pg_free(slot_arr->slots);
+	slot_arr->nslots = 0;
+}
 
 static void
 print_db_infos(DbInfoArr *db_arr)
@@ -660,3 +762,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..16d529668c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,19 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +190,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all logicalslotinfos */
 } DbInfo;
 
 /*
@@ -304,6 +319,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +417,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..e25bcd0142
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for logical replication, especially for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Schema setup
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Initialize new publisher node
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with wrong max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+$old_publisher->start;
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub");
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Define another replication slot which allows to decode prepared transactions
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_catalog.pg_create_logical_replication_slot('twophase_slot', 'pgoutput', false, true)");
+
+# Preparations for upgrading publisher
+$old_publisher->stop;
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check whether the replication slot is copied to new publisher
+$new_publisher->start;
+$result =
+  $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|f
+twophase_slot|t), 'check the replication slot is copied to new publisher');
+
+# Change connection string and enable logical replication
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+
+$new_publisher->wait_for_catchup('sub');
+
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v9-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchapplication/octet-stream; name=v9-0002-Always-persist-to-disk-logical-slots-during-a-shu.patchDownload

From b1b7b0d5eb56dda0263dab52578d9c1680ad2ccd Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v9 2/4] Always persist to disk logical slots during a shutdown
 checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 63481d826f..07d775cf33 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v9-0003-pg_upgrade-Add-check-function-for-include-logical.patchapplication/octet-stream; name=v9-0003-pg_upgrade-Add-check-function-for-include-logical.patchDownload

From 1b189e20c2348aaddda568534268a0ac3d51507d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v9 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 81 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 38 ++++++++-
 2 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 9674c1a2ce..294668e4dc 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -105,6 +108,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1436,6 +1441,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1459,3 +1468,73 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogShortPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet.",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slot requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index e25bcd0142..5a49ac96bb 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,6 +85,38 @@ $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
 $old_publisher->start;
 
+# Create a dummy slot on old publisher to fail the test
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('dropme_slot', 'pgoutput')");
+$old_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because dropme_slot is not
+# advanced till the INSERT statement
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('dropme_slot')");
+
 # Setup logical replication
 my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
@@ -96,7 +128,7 @@ $subscriber->safe_psql('postgres',
 $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 
 my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(10), 'check initial rows on subscriber');
+is($result, qq(20), 'check initial rows on subscriber');
 
 # Define another replication slot which allows to decode prepared transactions
 $old_publisher->safe_psql('postgres',
@@ -140,11 +172,11 @@ $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
 
 # Check whether changes on the new publisher get replicated to the subscriber
 $new_publisher->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	"INSERT INTO tbl VALUES (generate_series(21, 30))");
 
 $new_publisher->wait_for_catchup('sub');
 
 $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(20), 'check changes are shipped to subscriber');
+is($result, qq(30), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

v9-0004-Change-the-method-used-to-check-logical-replicati.patchapplication/octet-stream; name=v9-0004-Change-the-method-used-to-check-logical-replicati.patchDownload

From 4dd3cc2fcb51757b3e9a1e733e2efb334b3574e8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v9 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 50 ++++++++++++--
 2 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 294668e4dc..985614acab 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -109,7 +110,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1469,6 +1482,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active.",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slot with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 5a49ac96bb..a3260e0c88 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -73,7 +73,6 @@ command_fails(
 		'-p',         $old_publisher->port,
 		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
-		'--check'
 	],
 	'run of pg_upgrade of old publisher with wrong max_replication_slots');
 ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
@@ -105,7 +104,6 @@ command_fails(
 		'-p',         $old_publisher->port,
 		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
-		'--check'
 	],
 	'run of pg_upgrade of old publisher with idle replication slots');
 ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
@@ -130,9 +128,49 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
 is($result, qq(20), 'check initial rows on subscriber');
 
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_publisher->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(21, 30))
+});
+
+$result = $old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old publisher');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("COMMIT");
+$bsession->quit;
+$old_publisher->wait_for_catchup('sub');
+
 # Define another replication slot which allows to decode prepared transactions
 $old_publisher->safe_psql('postgres',
-	"SELECT pg_catalog.pg_create_logical_replication_slot('twophase_slot', 'pgoutput', false, true)");
+	"SELECT pg_catalog.pg_create_logical_replication_slot('twophase_slot', 'pgoutput', false, true)"
+);
 
 # Preparations for upgrading publisher
 $old_publisher->stop;
@@ -160,7 +198,7 @@ $new_publisher->start;
 $result =
   $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(sub|f
+is( $result, qq(sub|f
 twophase_slot|t), 'check the replication slot is copied to new publisher');
 
 # Change connection string and enable logical replication
@@ -172,11 +210,11 @@ $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
 
 # Check whether changes on the new publisher get replicated to the subscriber
 $new_publisher->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(21, 30))");
+	"INSERT INTO tbl VALUES (generate_series(31, 40))");
 
 $new_publisher->wait_for_catchup('sub');
 
 $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(30), 'check changes are shipped to subscriber');
+is($result, qq(40), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#29

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#27)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for giving comments. New patchset can be available in [1]/messages/by-id/TYAPR01MB58669413A5A2E3E50BD0B7E7F5679@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Thanks for the updated patch.
A Few comments:
1) if the verbose option is enabled, we should print the new slot
information, we could add a function print_slot_infos similar to
print_rel_infos which could print slot name and two_phase is enabled
or not.
+       end_progress_output();
+       check_ok();
+
+       /* update new_cluster info now that we have objects in the databases */
+       get_db_and_rel_infos(&new_cluster);
+}

I was not sure we should add the print because any other objects like publication
and subscription seem not to be printed, but added.
While implementing it, I thought that calling get_db_and_rel_infos() again
was not efficient because free_db_and_rel_infos() will be called at that time. So I added
get_logical_slot_infos() instead.
Additionally, I added a check for check_new_cluster_is_empty() for making ensure that
there are no logical slots on new node.

2) Since we will be using this option with pg_upgrade, should we use
this along with the --binary-upgrade option only?
+       if (dopt.logical_slots_only && dopt.dataOnly)
+               pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+       if (dopt.logical_slots_only && dopt.schemaOnly)
+               pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");

Right, I added the check.

3) Since it two_phase is boolean, can we use bool data type instead of string:
+               slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+               slotinfo[i].dobj.catId.tableoid = InvalidOid;
+               slotinfo[i].dobj.catId.oid = InvalidOid;
+               AssignDumpId(&slotinfo[i].dobj);
+
+               slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i,
i_slotname));
+
+               slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+               slotinfo[i].twophase = pg_strdup(PQgetvalue(res, i,
i_twophase));

We can change it to something like:
if (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0)
slotinfo[i].twophase = true;
else
slotinfo[i].twophase = false;

Seems right, fixed.

4) The comments are inconsistent, some have termination characters and
some don't. We can keep it consistent:
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old publisher node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+$subscriber->start;
+
+# Schema setup
+$old_publisher->safe_psql('postgres',
+       "CREATE TABLE tbl AS SELECT generate_series(1,10) AS a");
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+
+# Initialize new publisher node

Removed all termination.

5) should we use free instead of pfree as used in other function like
dumpForeignServer:
+               appendPQExpBuffer(query, ");");
+
+               ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+                                        ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+
.section = SECTION_POST_DATA,
+
.createStmt = query->data));
+
+               pfree(slotname);
+               destroyPQExpBuffer(query);
+       }
+}

Actually it works because for the client, the pfree() is just a wrapper of pg_free(),
but I agreed that it should be fixed. So did that.

[1]: /messages/by-id/TYAPR01MB58669413A5A2E3E50BD0B7E7F5679@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#30

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#28)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

On Mon, Apr 24, 2023 at 12:03:05PM +0000, Hayato Kuroda (Fujitsu) wrote:

I think that this test should be different when just checking for the
prerequirements (live_check / --check) compared to actually doing the upgrade,
as it's almost guaranteed that the slots won't have sent everything when the
source server is up and running.

Hmm, you assumed that the user application is still running and data is coming
continuously when doing --check, right? Personally I have thought that the
--check operation is executed just before the actual upgrading, therefore I'm not
sure your assumption is real problem.

The checks are always executed before doing the upgrade, to prevent it if
something isn't right. But you can also just do those check on a live
instance, so you can get a somewhat strong guarantee that the upgrade operation
will succeed before needing to stop all services and shut down postgres. It's
basically free to run those checks and can avoid an unnecessary service
interruption so I'm pretty sure people use it quite often.

And I could not find any checks which their
contents are changed based on the --check option.

Yes, because other checks are things that you can actually fix when the
instance is running, like getting rid of tables with oids. The only semi
exception if for 2pc which can be continuously prepared and committed, but if
you hit that problem at least you know you have to stop cleanly your XA-like
application and make sure there are no 2pc left.

Yeah, if we support the case checking pg_replication_slots.active may be sufficient.
Actually this cannot handle the case that pg_create_logical_replication_slot()
is executed just before upgrading, but I'm not sure it should be.

It shouldn't, same for any of the other checks. The live check can't predict
the future, it just tells you if there's anything that would prevent the
upgrade *at the moment it's executed*.

#31

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#28)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hackers,

Thank you for giving comments! PSA new version.

Note that due to the current version could not work well on FreeBSD, maybe
because of the timing issue[1]https://cirrus-ci.com/build/4676441267240960. I'm now analyzing the reason and will post
the fixed version.

[1]: https://cirrus-ci.com/build/4676441267240960

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#32

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#28)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On 24.04.23 14:03, Hayato Kuroda (Fujitsu) wrote:

so at least there's a good chance that they will still be at
shutdown, and will therefore send all the data to the subscribers? Having a
regression tests for that scenario would also be a good idea. Having an
uncommitted write transaction should be enough to cover it.

I think background_psql() can be used for the purpose. Before doing pg_upgrade
--check, a transaction is opened and kept. It means that the confirmed_flush has
been not reached to the current WAL position yet, but the checking says OK
because all slots are active.

A suggestion: You could write some/most tests against test_decoding
rather than the publication/subscription system. That way, you can
avoid many timing issues in the tests and you can check more exactly
that the slots produce the output you want. This would also help ensure
that this new facility works for other logical decoding output plugins
besides the built-in one.

#33

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Eisentraut (#32)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

A suggestion: You could write some/most tests against test_decoding
rather than the publication/subscription system. That way, you can
avoid many timing issues in the tests and you can check more exactly
that the slots produce the output you want. This would also help ensure
that this new facility works for other logical decoding output plugins
besides the built-in one.

Good point. I think almost tests except --check part can be rewritten.
PSA new patchset.

Additionally, I fixed followings:

- Added initialization for slot_arr.*. This is needed to check whether
the entry has already been allocated, in get_logical_slot_infos().
Previously double-free was occurred in some platform.
- fixed condition in get_logical_slot_infos()
- Changed the expected size of page header to longer one(SizeOfXLogLongPHD).
If the WAL page is the first one in the WAL segment file, the long header seems
to be used.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v10-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v10-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From eef7163f5b6b22043d4f8a9396df53965477554e Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v10 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Apart from restoring schema, pg_resetwal must not be called after restoring
replication slots. This is because the command discards WAL files and starts from a
new segment, even if they are required by replication slots. This leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this, replication slots
are restored at a different time than other objects, after running pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pg_dump.sgml                 |  10 ++
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 150 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/check.c                    |  57 +++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 125 ++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  18 +++
 .../t/003_logical_replication_slots.pl        | 109 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 601 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e81e35c13b..6e07f85281 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1206,6 +1206,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--logical-replication-slots-only</option></term>
+      <listitem>
+       <para>
+        Dump only logical replication slots; not the schema (data definitions),
+        nor data. This is mainly used when upgrading nodes.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-?</option></term>
        <term><option>--help</option></term>
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 058244cd17..ed98572cce 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,14 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && !dopt.binary_upgrade)
+		pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+	if (dopt.logical_slots_only && dopt.dataOnly)
+		pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+	if (dopt.logical_slots_only && dopt.schemaOnly)
+		pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +892,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +962,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1137,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10282,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18261,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18523,118 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+	DumpOptions *dopt = fout->dopt;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+		return;
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+
+	if (!dopt->logical_slots_only)
+		return;
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+		char	   *slotname = pg_strdup(slotinfo->dobj.name);
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotname, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		free(slotname);
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..9674c1a2ce 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -88,6 +89,7 @@ check_and_dump_old_cluster(bool live_check)
 
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
+	get_logical_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -188,6 +190,7 @@ void
 check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
+	get_logical_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +213,9 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir(&new_cluster);
+
+	if (user_opts.include_logical_slots)
+		check_for_parameter_settings(&new_cluster);
 }
 
 
@@ -364,6 +370,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existing of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1424,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..8cc2e3b4ae 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,12 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
+static void free_logical_slot_infos(LogicalSlotInfoArr *slot_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -283,8 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
 		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
 
+		/*
+		 * Additionally, slot_arr must be initialized because they will be
+		 * checked later.
+		 */
+		cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+		cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+	}
+
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
 	else
@@ -600,6 +611,93 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
+			free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+
+		get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
+	}
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	if (log_opts.verbose)
+	{
+		for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+		{
+			pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
+			print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos;
+
+	int			ntups;
+	int			slotnum;
+	int			num_slots = 0;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	ntups = PQntuples(res);
+
+	slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	for (slotnum = 0; slotnum < ntups; slotnum++)
+	{
+		LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+		curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+		curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+		curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+	}
+
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -634,6 +732,19 @@ free_rel_infos(RelInfoArr *rel_arr)
 	rel_arr->nrels = 0;
 }
 
+static void
+free_logical_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		pg_free(slot_arr->slots[slotnum].slotname);
+		pg_free(slot_arr->slots[slotnum].plugin);
+	}
+	pg_free(slot_arr->slots);
+	slot_arr->nslots = 0;
+}
 
 static void
 print_db_infos(DbInfoArr *db_arr)
@@ -660,3 +771,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 8869b6b60d..f7b5ec9879 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..16d529668c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,19 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +190,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all logicalslotinfos */
 } DbInfo;
 
 /*
@@ -304,6 +319,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +417,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..6689a91fcc
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,109 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v10-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v10-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From ff0fb35aa1ce896f1a5a786733a98f81538c5254 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v10 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 63481d826f..07d775cf33 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v10-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v10-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From d6cc01a8ad04536243f6e3072c0c13f8a3ca4ebf Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v10 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 9674c1a2ce..946431f059 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -105,6 +108,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1436,6 +1441,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1459,3 +1468,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 6689a91fcc..378e2bfb6e 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -79,11 +79,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

v10-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v10-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From 077ab6ebe26e604873bc48a11990cecad6badf11 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v10 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 946431f059..247f8cb5f3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -109,7 +110,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1469,6 +1482,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 378e2bfb6e..aa52445b08 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -70,15 +74,71 @@ ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
 rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

#34

Alvaro Herrera

alvherre@alvh.no-ip.org

over 2 years ago

In reply to: Julien Rouhaud (#8)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On 2023-Apr-07, Julien Rouhaud wrote:

That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

So IMO, for now don't bother with physical replication slot
preservation, but do keep the option name as specific to logical slots.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#35

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Alvaro Herrera (#34)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:

On 2023-Apr-07, Julien Rouhaud wrote:

That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

I don't think anyone suggested to do physical replication over different major
versions. My understanding was that it would be used to pg_upgrade a
"physical cluster" (e.g. a primary and physical standby server) at the same
time, and then simply starting them up again would lead to a working physical
replication on the new version.

I guess one could try to keep using the slots for other needs (PITR backup with
pg_receivewal or something similar), and then you would indeed have to be aware
that you won't be able to do anything with the new WAL records until you do a
fresh base backup, but that's a problem that you can already face after a
normal pg_upgrade (although in most cases it's probably quite obvious for now
as the timeline isn't preserved).

#36

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Julien Rouhaud (#35)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, 2 May 2023, 19:43 Julien Rouhaud, <rjuju123@gmail.com> wrote:

Hi,

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:

On 2023-Apr-07, Julien Rouhaud wrote:

That being said, I have a hard time believing that we could actually

preserve

physical replication slots. I don't think that pg_upgrade final state

is fully

reproducible: not all object oids are preserved, and the various

pg_restore

are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication.

Even if we

could make it totally reproducible, it would probably be at the cost

of making

pg_upgrade orders of magnitude slower. And since many people are

already

complaining that it's too slow, that doesn't seem like something we

would want.

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

I don't think anyone suggested to do physical replication over different
major
versions. My understanding was that it would be used to pg_upgrade a
"physical cluster" (e.g. a primary and physical standby server) at the same
time, and then simply starting them up again would lead to a working
physical
replication on the new version.

I guess one could try to keep using the slots for other needs (PITR backup
with
pg_receivewal or something similar), and then you would indeed have to be
aware
that you won't be able to do anything with the new WAL records until you
do a
fresh base backup, but that's a problem that you can already face after a
normal pg_upgrade (although in most cases it's probably quite obvious for
now
as the timeline isn't preserved).

if what you meant is that the slot may have to send a record generated by
an older major version, then unless I'm missing something the same
restriction could be added to such a feature as what's being discussed in
this thread for the logical replication slots. so only a final shutdown
checkpoint record would be present after the flushed WAL position. it may
be possible to work around that, if there weren't all the other problems I
mentioned.

Show quoted text

#37

Alvaro Herrera

alvherre@alvh.no-ip.org

over 2 years ago

In reply to: Julien Rouhaud (#35)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On 2023-May-02, Julien Rouhaud wrote:

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

I don't think anyone suggested to do physical replication over different major
versions.

They didn't, but a man can dream. (Anyway, we agree on it not working
for various reasons.)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

#38

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Alvaro Herrera (#34)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Alvaro,

Thanks for giving suggestion!

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

So IMO, for now don't bother with physical replication slot
preservation, but do keep the option name as specific to logical slots.

Based on the Julien's advice, We have already decided not to include physical
slots in this patch and the option name has been changed.
I think you said explicitly that we are going correct way. Thanks!

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#39

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#33)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san. Here are some review comments for the v10-0001 patch.

======

General.

1. pg_dump option is documented to the user.

I'm not sure about exposing the new pg_dump
--logical-replication-slots-only option to the user.

I thought this pg_dump option was intended only to be called
*internally* by the pg_upgrade.
But, this patch is also documenting the new option for the user (in
case they want to call it independently?)

Maybe exposing it is OK, but if you do that then I thought perhaps
there should also be some additional pg_dump tests just for this
option (i.e. tested independently of the pg_upgrade)

======
Commit message

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Apart from restoring schema, pg_resetwal must not be called
after restoring
replication slots. This is because the command discards WAL files and
starts from a
new segment, even if they are required by replication slots. This
leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this,
replication slots
are restored at a different time than other objects, after running pg_resetwal.

The "Apart from" sentence maybe could do with some rewording. I
noticed there is a code comment (below fragment) that says the same as
this, but more clearly. Maybe it is better to use that code-comment
wording in the comment message.

+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.

======
src/bin/pg_dump/pg_dump.c

3. main

+ if (dopt.logical_slots_only && !dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.logical_slots_only && dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+ if (dopt.logical_slots_only && dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
+

Consider if it might be simpler to combine together all those
dopt.logical_slots_only checks.

SUGGESTION

if (dopt.logical_slots_only)
{
if (!dopt.binary_upgrade)
pg_fatal("options --logical-replication-slots-only requires
option --binary-upgrade");

if (dopt.dataOnly)
pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
if (dopt.schemaOnly)
pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
}

~~~

4. getLogicalReplicationSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+ return;

I'm not sure if this check is necessary. Given the way this function
is called, is it possible for this check to fail? Maybe that quick
exit would be better code as an Assert?

~~~

5. dumpLogicalReplicationSlot

+dumpLogicalReplicationSlot(Archive *fout,
+    const LogicalReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+
+ if (!dopt->logical_slots_only)
+ return;

(Similar to the previous comment). Is it even possible to arrive here
when dopt->logical_slots_only is false. Maybe that quick exit would be
better coded as an Assert?

6.
+ PQExpBuffer query = createPQExpBuffer();
+ char    *slotname = pg_strdup(slotinfo->dobj.name);

I wondered if it was really necessary to strdup/free this slotname.
e.g. And if it is, then why don't you do this for the slotinfo->plugin
field?

======
src/bin/pg_upgrade/check.c

7. check_and_dump_old_cluster

/* Extract a list of databases and tables from the old cluster */
get_db_and_rel_infos(&old_cluster);
+ get_logical_slot_infos(&old_cluster);

Is it correct to associate this new call with that existing comment
about "databases and tables"?

~~~

8. check_new_cluster

@@ -188,6 +190,7 @@ void
check_new_cluster(void)
{
get_db_and_rel_infos(&new_cluster);
+ get_logical_slot_infos(&new_cluster);

check_new_cluster_is_empty();

@@ -210,6 +213,9 @@ check_new_cluster(void)
check_for_prepared_transactions(&new_cluster);

  check_for_new_tablespace_dir(&new_cluster);
+
+ if (user_opts.include_logical_slots)
+ check_for_parameter_settings(&new_cluster);

Can the get_logical_slot_infos() be done later, guarded by that the
same condition if (user_opts.include_logical_slots)?

~~~

9. check_new_cluster_is_empty

+ * If --include-logical-replication-slots is required, check the
+ * existing of slots
+ */

Did you mean to say "check the existence of slots"?

~~~

10. check_for_parameter_settings

+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+ wal_level);

/but set to/but is set to/

======
src/bin/pg_upgrade/info.c

11. get_db_and_rel_infos

+ {
get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);

+ /*
+ * Additionally, slot_arr must be initialized because they will be
+ * checked later.
+ */
+ cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+ cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+ }

11a.
I think probably it would have been easier to just use 'pg_malloc0'
instead of 'pg_malloc' in the get_db_infos, then this code would not
be necessary.

11b.
BTW, shouldn't this function also be calling free_logical_slot_infos()
too? That will also have the same effect (initializing the slot_arr)
but without having to change anything else.

~~~

12. get_logical_slot_infos
+/*
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)

To be consistent with the other nearby function headers it should have
another line saying just get_logical_slot_infos().

~~~

13. get_logical_slot_infos

+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
+ free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+
+ get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
+ }
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ if (log_opts.verbose)
+ {
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
+ print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+ }
+ }
+}

I didn't see why there are 2 loops exactly the same. I think with some
minor refactoring these can both be done in the same loop can't they?

SUGGESTION 1:

if (cluster == &old_cluster)
pg_log(PG_VERBOSE, "\nsource databases:");
else
pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);

get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);

if (log_opts.verbose)
{
pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
}
}

I expected it could be simplified further still by using some variables

SUGGESTION 2:

if (cluster == &old_cluster)
pg_log(PG_VERBOSE, "\nsource databases:");
else
pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
if (pDbInfo->slot_arr.slots)
free_logical_slot_infos(&pDbInfo->slot_arr);

get_logical_slot_infos_per_db(cluster, pDbInfo);

if (log_opts.verbose)
{
pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
print_slot_infos(&pDbInfo->slot_arr);
}
}

~~~

14. get_logical_slot_infos_per_db

+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query + strlen(query), sizeof(query) - strlen(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

I didn't understand the purpose of those calls to 'strlen(query)'
since the string was initialised to empty-string immediately above.

~~~

15.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}

IMO those colons don't make sense.

BEFORE
"slotname: %s: plugin: %s: two_phase %d"

SUGGESTION
"slotname: %s, plugin: %s, two_phase: %d"

======
src/bin/pg_upgrade/pg_upgrade.h

16. LogicalSlotInfo

+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* Can the slot decode 2PC? */
+} LogicalSlotInfo;

The RelInfo had a comment for the typedef struct, so I think the
LogicalSlotInfo struct also should have a comment.

~~~

17. DbInfo

RelInfoArr rel_arr; /* array of all user relinfos */
+ LogicalSlotInfoArr slot_arr; /* array of all logicalslotinfos */
} DbInfo;

Should the comment say "LogicalSlotInfo" instead of "logicalslotinfos"?

======
.../t/003_logical_replication_slots.pl

18. RESULTS

I run this by 'make check' in the src/bin/pg_upgrade folder.

For some reason, the test does not work for me. The results I get are:

# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl ...................... ok
t/002_pg_upgrade.pl ................. ok
t/003_logical_replication_slots.pl .. 3/? # Tests were run but no plan
was declared and done_testing() was not seen.
t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
7424, 0x1d00)
All 4 subtests passed

Test Summary Report
-------------------
t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 4 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output
Files=3, Tests=27, 128 wallclock secs ( 0.04 usr 0.01 sys + 18.02
cusr 6.06 csys = 24.13 CPU)
Result: FAIL
make: *** [check] Error 1

And the log file
(tmp_check/log/003_logical_replication_slots_old_node.log) shows the
following ERROR:

2023-05-09 12:19:25.330 AEST [32572] 003_logical_replication_slots.pl
LOG: statement: SELECT
pg_create_logical_replication_slot('test_slot', 'test_decoding',
false, true);
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
ERROR: could not access file "test_decoding": No such file or
directory
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
STATEMENT: SELECT pg_create_logical_replication_slot('test_slot',
'test_decoding', false, true);
2023-05-09 12:19:25.335 AEST [32564] LOG: received immediate shutdown request
2023-05-09 12:19:25.337 AEST [32564] LOG: database system is shut down

Is it a bug? Or, if I am doing something wrong please let me know how
to run the test.

~~~

19.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");

I think the last 2 lines are not "clean up". They are preparations for
the subsequent test, so maybe they should be commented as such.

~~~

20.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");

I think the last line is not "clean up". It is preparation for the
subsequent test, so maybe it should be commented as such.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#40

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#39)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version.

General.

1. pg_dump option is documented to the user.

I'm not sure about exposing the new pg_dump
--logical-replication-slots-only option to the user.

I thought this pg_dump option was intended only to be called
*internally* by the pg_upgrade.
But, this patch is also documenting the new option for the user (in
case they want to call it independently?)

Maybe exposing it is OK, but if you do that then I thought perhaps
there should also be some additional pg_dump tests just for this
option (i.e. tested independently of the pg_upgrade)

Right, I have written the document for the moment, but it should not
If it is not exposed. Removed from the doc.

Commit message

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Apart from restoring schema, pg_resetwal must not be called
after restoring
replication slots. This is because the command discards WAL files and
starts from a
new segment, even if they are required by replication slots. This
leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this,
replication slots
are restored at a different time than other objects, after running pg_resetwal.

~~

The "Apart from" sentence maybe could do with some rewording. I
noticed there is a code comment (below fragment) that says the same as
this, but more clearly. Maybe it is better to use that code-comment
wording in the comment message.
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.

Changed.

src/bin/pg_dump/pg_dump.c

3. main
+ if (dopt.logical_slots_only && !dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.logical_slots_only && dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+ if (dopt.logical_slots_only && dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
+
Consider if it might be simpler to combine together all those
dopt.logical_slots_only checks.

SUGGESTION

if (dopt.logical_slots_only)
{
if (!dopt.binary_upgrade)
pg_fatal("options --logical-replication-slots-only requires
option --binary-upgrade");

if (dopt.dataOnly)
pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
if (dopt.schemaOnly)
pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
}

Right, fixed.

4. getLogicalReplicationSlots
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+ return;
I'm not sure if this check is necessary. Given the way this function
is called, is it possible for this check to fail? Maybe that quick
exit would be better code as an Assert?

I think the version check must be needed because it is not done yet.
(Actually I'm not sure the restriction is needed, but now I will keep)
About dopt->logical_slots_only, I agreed to remove that.

5. dumpLogicalReplicationSlot
+dumpLogicalReplicationSlot(Archive *fout,
+    const LogicalReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+
+ if (!dopt->logical_slots_only)
+ return;
(Similar to the previous comment). Is it even possible to arrive here
when dopt->logical_slots_only is false. Maybe that quick exit would be
better coded as an Assert?

I think it is not possible, so changed to Assert().

6.
+ PQExpBuffer query = createPQExpBuffer();
+ char    *slotname = pg_strdup(slotinfo->dobj.name);
I wondered if it was really necessary to strdup/free this slotname.
e.g. And if it is, then why don't you do this for the slotinfo->plugin
field?

This was a debris for my testing. Removed.

src/bin/pg_upgrade/check.c

7. check_and_dump_old_cluster

/* Extract a list of databases and tables from the old cluster */
get_db_and_rel_infos(&old_cluster);
+ get_logical_slot_infos(&old_cluster);

Is it correct to associate this new call with that existing comment
about "databases and tables"?

Added a comment.

8. check_new_cluster

@@ -188,6 +190,7 @@ void
check_new_cluster(void)
{
get_db_and_rel_infos(&new_cluster);
+ get_logical_slot_infos(&new_cluster);

check_new_cluster_is_empty();

@@ -210,6 +213,9 @@ check_new_cluster(void)
check_for_prepared_transactions(&new_cluster);
check_for_new_tablespace_dir(&new_cluster);
+
+ if (user_opts.include_logical_slots)
+ check_for_parameter_settings(&new_cluster);
Can the get_logical_slot_infos() be done later, guarded by that the
same condition if (user_opts.include_logical_slots)?

Added.

9. check_new_cluster_is_empty
+ * If --include-logical-replication-slots is required, check the
+ * existing of slots
+ */
Did you mean to say "check the existence of slots"?

Yes, it is my typo. Fixed.

10. check_for_parameter_settings
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+ wal_level);
/but set to/but is set to/

Fixed.

src/bin/pg_upgrade/info.c

11. get_db_and_rel_infos

+ {
get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+ /*
+ * Additionally, slot_arr must be initialized because they will be
+ * checked later.
+ */
+ cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+ cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+ }
11a.
I think probably it would have been easier to just use 'pg_malloc0'
instead of 'pg_malloc' in the get_db_infos, then this code would not
be necessary.

I was not sure whether it is OK to change like that because of the
performance efficiency. But OK, fixed.

11b.
BTW, shouldn't this function also be calling free_logical_slot_infos()
too? That will also have the same effect (initializing the slot_arr)
but without having to change anything else.

~~~
12. get_logical_slot_infos
+/*
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
To be consistent with the other nearby function headers it should have
another line saying just get_logical_slot_infos().

Added.

13. get_logical_slot_infos

+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
+ free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+
+ get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
+ }
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ if (log_opts.verbose)
+ {
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
+ print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+ }
+ }
+}

I didn't see why there are 2 loops exactly the same. I think with some
minor refactoring these can both be done in the same loop can't they?

The style follows get_db_and_rel_infos(), but...

SUGGESTION 1:

if (cluster == &old_cluster)
pg_log(PG_VERBOSE, "\nsource databases:");
else
pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);

get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);

if (log_opts.verbose)
{
pg_log(PG_VERBOSE, "Database: %s",
cluster->dbarr.dbs[dbnum].db_name);
print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
}
}

~

I expected it could be simplified further still by using some variables

SUGGESTION 2:

if (cluster == &old_cluster)
pg_log(PG_VERBOSE, "\nsource databases:");
else
pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
if (pDbInfo->slot_arr.slots)
free_logical_slot_infos(&pDbInfo->slot_arr);

get_logical_slot_infos_per_db(cluster, pDbInfo);

if (log_opts.verbose)
{
pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
print_slot_infos(&pDbInfo->slot_arr);
}
}

I chose SUGGESTION 2.

14. get_logical_slot_infos_per_db
+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query + strlen(query), sizeof(query) - strlen(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
I didn't understand the purpose of those calls to 'strlen(query)'
since the string was initialised to empty-string immediately above.

Removed.

15.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}

IMO those colons don't make sense.

BEFORE
"slotname: %s: plugin: %s: two_phase %d"

SUGGESTION
"slotname: %s, plugin: %s, two_phase: %d"

Fixed. I followed print_rel_infos() style, but I prefer yours.

src/bin/pg_upgrade/pg_upgrade.h

16. LogicalSlotInfo
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* Can the slot decode 2PC? */
+} LogicalSlotInfo;
The RelInfo had a comment for the typedef struct, so I think the
LogicalSlotInfo struct also should have a comment.

Added.

17. DbInfo

RelInfoArr rel_arr; /* array of all user relinfos */
+ LogicalSlotInfoArr slot_arr; /* array of all logicalslotinfos */
} DbInfo;

Should the comment say "LogicalSlotInfo" instead of "logicalslotinfos"?

Right, fixed.

.../t/003_logical_replication_slots.pl

18. RESULTS

I run this by 'make check' in the src/bin/pg_upgrade folder.

For some reason, the test does not work for me. The results I get are:

# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl ...................... ok
t/002_pg_upgrade.pl ................. ok
t/003_logical_replication_slots.pl .. 3/? # Tests were run but no plan
was declared and done_testing() was not seen.
t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
7424, 0x1d00)
All 4 subtests passed

Test Summary Report
-------------------
t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 4 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output
Files=3, Tests=27, 128 wallclock secs ( 0.04 usr 0.01 sys + 18.02
cusr 6.06 csys = 24.13 CPU)
Result: FAIL
make: *** [check] Error 1

~

And the log file
(tmp_check/log/003_logical_replication_slots_old_node.log) shows the
following ERROR:

2023-05-09 12:19:25.330 AEST [32572] 003_logical_replication_slots.pl
LOG: statement: SELECT
pg_create_logical_replication_slot('test_slot', 'test_decoding',
false, true);
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
ERROR: could not access file "test_decoding": No such file or
directory
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
STATEMENT: SELECT pg_create_logical_replication_slot('test_slot',
'test_decoding', false, true);
2023-05-09 12:19:25.335 AEST [32564] LOG: received immediate shutdown
request
2023-05-09 12:19:25.337 AEST [32564] LOG: database system is shut down

~

Is it a bug? Or, if I am doing something wrong please let me know how
to run the test.

Good point. I could not find the problem because I used meson build system.
When I used the traditional make, the ERROR could be reproduced.
IIUC the problem was occurred the dependency between pg_upgrade and test_decoding
was not set in the Makefile. Hence, I added a variable EXTRA_INSTALL to Makefile in
order to clarify the dependency. This followed other directories like pg_basebackup.

19.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
I think the last 2 lines are not "clean up". They are preparations for
the subsequent test, so maybe they should be commented as such.

Right, it is a preparation for the next. Added a comment.

20.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
I think the last line is not "clean up". It is preparation for the
subsequent test, so maybe it should be commented as such.

Added a comment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v11-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v11-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From c27ef8c9dc6288cb5833e4d4d4f36d0f9af9464d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v11 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 150 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  60 +++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 119 +++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 115 ++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 599 insertions(+), 5 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..906f9a9541 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,18 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +896,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +966,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1141,8 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	printf(_("  --logical-replication-slots-only\n"
+			 "                               dump only logical replication slots, no schema or data\n"));
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10286,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18265,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18527,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..1802a30fe6 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -188,6 +193,7 @@ void
 check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
+	get_logical_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +216,9 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir(&new_cluster);
+
+	if (user_opts.include_logical_slots)
+		check_for_parameter_settings(&new_cluster);
 }
 
 
@@ -364,6 +373,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1427,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..9679941217 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,12 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
+static void free_logical_slot_infos(LogicalSlotInfoArr *slot_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -394,7 +396,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +602,94 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		if (pDbInfo->slot_arr.slots)
+			free_logical_slot_infos(&pDbInfo->slot_arr);
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos;
+
+	int			ntups;
+	int			slotnum;
+	int			num_slots = 0;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	ntups = PQntuples(res);
+
+	slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	for (slotnum = 0; slotnum < ntups; slotnum++)
+	{
+		LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+		curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+		curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+		curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+	}
+
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -634,6 +724,19 @@ free_rel_infos(RelInfoArr *rel_arr)
 	rel_arr->nrels = 0;
 }
 
+static void
+free_logical_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		pg_free(slot_arr->slots[slotnum].slotname);
+		pg_free(slot_arr->slots[slotnum].plugin);
+	}
+	pg_free(slot_arr->slots);
+	slot_arr->nslots = 0;
+}
 
 static void
 print_db_infos(DbInfoArr *db_arr)
@@ -660,3 +763,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: %s, plugin: %s, two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..3430c641aa
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,115 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibit.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v11-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v11-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From d388af3dc8aea721d27f16dd0c535004d5def0c8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v11 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bc5a8e0569..78b4528f2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v11-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v11-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 369128a910d704294cf8a7bfabaa34b42671be8d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v11 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1802a30fe6..678867d64d 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +111,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1439,6 +1444,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1462,3 +1471,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 3430c641aa..e0e68df1ca 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

v11-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v11-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From fcfd691eb1a2b8621ee6ef9c0aab8a2d91ee5049 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v11 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 678867d64d..eef44706f4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -112,7 +113,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1472,6 +1485,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index e0e68df1ca..cf58f735f9 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,15 +80,71 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

#41

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#40)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san. I checked again the v11-0001.

Here are a few more review comments.

======
src/bin/pg_dump/pg_dump.c

1. help

  printf(_("  --inserts                    dump data as INSERT
commands, rather than COPY\n"));
  printf(_("  --load-via-partition-root    load partitions via the
root table\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
  printf(_("  --no-comments                do not dump comments\n"));

Now you removed the PG Docs for the internal pg_dump option based on
my previous review comment (see [2]Kuroda-san's reply to my v10 review - /messages/by-id/TYAPR01MB5866A537AC9AD46E49345A78F5769@TYAPR01MB5866.jpnprd01.prod.outlook.com#1). So does it mean this "help"
also be removed so this option will be completely invisible to the
user? I am not sure, but if you do choose to remove this help then
probably a comment should be added here to explain why it is
deliberately not listed.

======
src/bin/pg_upgrade/check.c

2. check_new_cluster

Although you wrote "Added", I don't think my previous comment ([1]My v10 review - /messages/by-id/CAHut+PtpQaKVfqr-8KUtGZqei1C9gWF0+Y8n1UafvAQeS4G_hg@mail.gmail.com#8)
was yet addressed.

What I mean to say ask was: can that call to get_logical_slot_infos()
be done later, only when you know that option was specified?

e.g

BEFORE
get_logical_slot_infos(&new_cluster);
...
if (user_opts.include_logical_slots)
check_for_parameter_settings(&new_cluster);

SUGGESTION
if (user_opts.include_logical_slots)
{
get_logical_slot_infos(&new_cluster);
check_for_parameter_settings(&new_cluster);
}

======
src/bin/pg_upgrade/info.c

3. get_db_and_rel_infos

src/bin/pg_upgrade/info.c

11. get_db_and_rel_infos

+ {
get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+ /*
+ * Additionally, slot_arr must be initialized because they will be
+ * checked later.
+ */
+ cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+ cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+ }
11a.
I think probably it would have been easier to just use 'pg_malloc0'
instead of 'pg_malloc' in the get_db_infos, then this code would not
be necessary.

I was not sure whether it is OK to change like that because of the
performance efficiency. But OK, fixed.

11b.
BTW, shouldn't this function also be calling free_logical_slot_infos()
too? That will also have the same effect (initializing the slot_arr)
but without having to change anything else.

Above is your reply ([2]Kuroda-san's reply to my v10 review - /messages/by-id/TYAPR01MB5866A537AC9AD46E49345A78F5769@TYAPR01MB5866.jpnprd01.prod.outlook.com11a). If you were not sure about the malloc0
then I think the suggestion ([1]My v10 review - /messages/by-id/CAHut+PtpQaKVfqr-8KUtGZqei1C9gWF0+Y8n1UafvAQeS4G_hg@mail.gmail.com#12b) achieves the same thing and
initializes those fields. You did not reply to 12b, so I wondered if
you accidentally missed that point.

~~~

4. get_logical_slot_infos

+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ if (pDbInfo->slot_arr.slots)
+ free_logical_slot_infos(&pDbInfo->slot_arr);

Maybe it is ok, but it seems unusual that this
get_logical_slot_infos() is also doing a free. I didn't notice this
same pattern with the other get_XXX functions. Why is it needed? Even
if pDbInfo->slot_arr.slots was not NULL, is the information stale or
will you just end up re-fetching the same info?

======
.../pg_upgrade/t/003_logical_replication_slots.pl

5.
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibit.

/prohibit/prohibited/

------
[1]: My v10 review - /messages/by-id/CAHut+PtpQaKVfqr-8KUtGZqei1C9gWF0+Y8n1UafvAQeS4G_hg@mail.gmail.com
/messages/by-id/CAHut+PtpQaKVfqr-8KUtGZqei1C9gWF0+Y8n1UafvAQeS4G_hg@mail.gmail.com
[2]: Kuroda-san's reply to my v10 review - /messages/by-id/TYAPR01MB5866A537AC9AD46E49345A78F5769@TYAPR01MB5866.jpnprd01.prod.outlook.com
/messages/by-id/TYAPR01MB5866A537AC9AD46E49345A78F5769@TYAPR01MB5866.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#42

Wei Wang (Fujitsu)

wangw.fnst@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#41)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, May 11, 2023 at 10:12 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Kuroda-san. I checked again the v11-0001.

Here are a few more review comments.

======
src/bin/pg_dump/pg_dump.c

1. help
printf(_("  --inserts                    dump data as INSERT
commands, rather than COPY\n"));
printf(_("  --load-via-partition-root    load partitions via the
root table\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
printf(_("  --no-comments                do not dump comments\n"));
Now you removed the PG Docs for the internal pg_dump option based on
my previous review comment (see [2]#1). So does it mean this "help"
also be removed so this option will be completely invisible to the
user? I am not sure, but if you do choose to remove this help then
probably a comment should be added here to explain why it is
deliberately not listed.

I'm not sure if there is any reason to not expose this new option? Do we have
concerns that users who use this new option by mistake may cause data
inconsistencies?

BTW, I think that all options of pg_dump (please see the array of long_options
in the main function of the pg_dump.c file) are currently exposed to the user.

Regards,
Wang wei

#43

Wei Wang (Fujitsu)

wangw.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#40)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, May 9, 2023 at 17:44 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.

Thanks for your patches.
Here are some comments for 0001 patch:

1. In the function getLogicalReplicationSlots
```
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
```
I think currently we are only dumping the definition of logical replication
slots. It seems better to set it as DUMP_COMPONENT_DEFINITION here.

2. In the function dumpLogicalReplicationSlot
```
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
```
I think if we do not set the member dropStmt in macro ARCHIVE_OPTS here, when we
specifying the option "--logical-replication-slots-only" and option "-c/--clean"
together, the "-c/--clean" will not work.

I think that we could use the function pg_drop_replication_slot to set this
member. Then, in the main function in the pg_dump.c file, we should add a check
to prevent specifying option "--logical-replication-slots-only" and
option "--if-exists" together.
Or, we could simply add a check to prevent specifying option
"--logical-replication-slots-only" and option "-c/--clean" together.
What do you think?

Regards,
Wang wei

#44

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Wei Wang (Fujitsu) (#42)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Wang,

I'm not sure if there is any reason to not expose this new option? Do we have
concerns that users who use this new option by mistake may cause data
inconsistencies?

BTW, I think that all options of pg_dump (please see the array of long_options
in the main function of the pg_dump.c file) are currently exposed to the user.

Apart from another database object, --logical-replication-slot-only does not provide
the "perfect" copy. As you might know, some attributes like xmin and restart_lsn
are not copied, it just creates similar replication slots which have same name,
plugin, and options. I think these things may be confused for users.

Moreover, I cannot come up with use-case which DBAs use the option alone.
If there is a good one, I can decide to remove the limitation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#45

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Wei Wang (Fujitsu) (#43)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Wang,

Thank you for reviewing! PSA new version.

1. In the function getLogicalReplicationSlots
```
+		/*
+		 * Note: Currently we do not have any options to include/exclude
slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
```
I think currently we are only dumping the definition of logical replication
slots. It seems better to set it as DUMP_COMPONENT_DEFINITION here.

Right. Actually it was harmless because another flags like DUMP_COMPONENT_DEFINITION
are not checked in dumpLogicalReplicationSlot(), but changed.

2. In the function dumpLogicalReplicationSlot
```
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+								  .section =
SECTION_POST_DATA,
+
.createStmt = query->data));
```
I think if we do not set the member dropStmt in macro ARCHIVE_OPTS here, when
we
specifying the option "--logical-replication-slots-only" and option "-c/--clean"
together, the "-c/--clean" will not work.
I think that we could use the function pg_drop_replication_slot to set this
member. Then, in the main function in the pg_dump.c file, we should add a check
to prevent specifying option "--logical-replication-slots-only" and
option "--if-exists" together.
Or, we could simply add a check to prevent specifying option
"--logical-replication-slots-only" and option "-c/--clean" together.
What do you think?

I chose not to allow to combine with -c. Assuming that this option is used only
by the pg_upgrade, it is ensured that new node does not have any logical replication
slots. So the remove function is not needed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v12-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v12-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From ec1981d9c2d6a5fe1e1924d4b8ee8d7447be0297 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v12 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 155 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  62 +++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 108 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 115 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 595 insertions(+), 5 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..0ff503a736 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,10 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not listed.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10291,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18270,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18532,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..c7bed668ac 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,12 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	if (user_opts.include_logical_slots)
+	{
+		get_logical_slot_infos(&new_cluster);
+		check_for_parameter_settings(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +375,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1429,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..6916cb23b4 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,11 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +601,98 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos;
+
+	int			ntups;
+	int			slotnum;
+	int			num_slots = 0;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	ntups = PQntuples(res);
+
+	if (ntups)
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+	else
+	{
+		slotinfos = NULL;
+		goto cleanup;
+	}
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	for (slotnum = 0; slotnum < ntups; slotnum++)
+	{
+		LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+		curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+		curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+		curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+	}
+
+cleanup:
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -634,7 +727,6 @@ free_rel_infos(RelInfoArr *rel_arr)
 	rel_arr->nrels = 0;
 }
 
-
 static void
 print_db_infos(DbInfoArr *db_arr)
 {
@@ -660,3 +752,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: %s, plugin: %s, two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..525a7704cf
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,115 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v12-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v12-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 35bdeb9ba30f37c235fd9f592df494847921562f Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v12 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bc5a8e0569..78b4528f2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v12-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v12-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 1398560d8c9cb316576375b7ceeec7a66b706f28 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v12 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index c7bed668ac..2e62eabe0a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +111,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1441,6 +1446,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1464,3 +1473,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 525a7704cf..21fefca084 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

v12-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v12-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From 4a4ecb107f7794edd59986826b794917df6f3210 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v12 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 2e62eabe0a..f589af9eba 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -112,7 +113,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1474,6 +1487,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 21fefca084..d7fd864bd7 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,15 +80,71 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

#46

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#41)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! New patch can be available at [1]/messages/by-id/TYAPR01MB5866A3B91F56056A803B94DAF5749@TYAPR01MB5866.jpnprd01.prod.outlook.com.

1. help
printf(_("  --inserts                    dump data as INSERT
commands, rather than COPY\n"));
printf(_("  --load-via-partition-root    load partitions via the
root table\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
printf(_("  --no-comments                do not dump comments\n"));
Now you removed the PG Docs for the internal pg_dump option based on
my previous review comment (see [2]#1). So does it mean this "help"
also be removed so this option will be completely invisible to the
user? I am not sure, but if you do choose to remove this help then
probably a comment should be added here to explain why it is
deliberately not listed.

Removed from help and comments were added instead.

2. check_new_cluster

Although you wrote "Added", I don't think my previous comment ([1]#8)
was yet addressed.

What I mean to say ask was: can that call to get_logical_slot_infos()
be done later, only when you know that option was specified?

e.g

BEFORE
get_logical_slot_infos(&new_cluster);
...
if (user_opts.include_logical_slots)
check_for_parameter_settings(&new_cluster);

SUGGESTION
if (user_opts.include_logical_slots)
{
get_logical_slot_infos(&new_cluster);
check_for_parameter_settings(&new_cluster);
}

Sorry for missing your comments. But I think get_logical_slot_infos() cannot be
executed later. In check_new_cluster_is_empty(), we must check not to exist any
replication slots on the new node because all of WALs will be truncated. Infos
related with slots are stored in get_logical_slot_infos(), so it must be executed
before check_new_cluster_is_empty(). Another possibility is to execute
check_for_parameter_settings() earlier, and I tried to do. The style seems little
bit strange, but it worked well. How do you think?

3. get_db_and_rel_infos
src/bin/pg_upgrade/info.c

11. get_db_and_rel_infos

+ {
get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+ /*
+ * Additionally, slot_arr must be initialized because they will be
+ * checked later.
+ */
+ cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+ cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+ }
11a.
I think probably it would have been easier to just use 'pg_malloc0'
instead of 'pg_malloc' in the get_db_infos, then this code would not
be necessary.
I was not sure whether it is OK to change like that because of the
performance efficiency. But OK, fixed.

11b.
BTW, shouldn't this function also be calling free_logical_slot_infos()
too? That will also have the same effect (initializing the slot_arr)
but without having to change anything else.

~

Above is your reply ([2]11a). If you were not sure about the malloc0
then I think the suggestion ([1]#12b) achieves the same thing and
initializes those fields. You did not reply to 12b, so I wondered if
you accidentally missed that point.

Sorry, this part is no longer needed. Please see below.

4. get_logical_slot_infos
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ if (pDbInfo->slot_arr.slots)
+ free_logical_slot_infos(&pDbInfo->slot_arr);
Maybe it is ok, but it seems unusual that this
get_logical_slot_infos() is also doing a free. I didn't notice this
same pattern with the other get_XXX functions. Why is it needed? Even
if pDbInfo->slot_arr.slots was not NULL, is the information stale or
will you just end up re-fetching the same info?

After considering more, I decided to remove the free function.

The reason why I did is that get_logical_slot_infos() for the new cluster is
called twice, one is for checking purpose in check_new_cluster() and
another is for updating the cluster info in create_logical_replication_slots().
At the first calling, we assume that logical slots do not exist on new node, but
even if the case a small memory are is allocated by pg_malloc(0).
(If there are some slots, it is not called twice.)
But I noticed that it can be avoided by adding the if-statement, so I did.

Additionally, the pg_malloc0() in get_db_and_rel_infos() is no more needed
because we do not have to check the un-initialized area.

.../pg_upgrade/t/003_logical_replication_slots.pl
5.
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibit.
/prohibit/prohibited/

Fixed.

[1]: /messages/by-id/TYAPR01MB5866A3B91F56056A803B94DAF5749@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#47

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#45)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san. Here are some comments for patch v12-0001.

======
src/bin/pg_upgrade/check.c

1. check_new_cluster

+ if (user_opts.include_logical_slots)
+ {
+ get_logical_slot_infos(&new_cluster);
+ check_for_parameter_settings(&new_cluster);
+ }
+
  check_new_cluster_is_empty();
~

The code is OK, but maybe your reply/explanation (see [2]Kuroda-san reply to my v11 review - /messages/by-id/TYAPR01MB5866BD618DEE62AF1836E612F5749@TYAPR01MB5866.jpnprd01.prod.outlook.com #2) saying
get_logical_slot_infos() needs to be called before
check_new_cluster_is_empty() would be good to have in a comment here?

======
src/bin/pg_upgrade/info.c

2. get_logical_slot_infos

+ if (ntups)
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+ else
+ {
+ slotinfos = NULL;
+ goto cleanup;
+ }
+
+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+ curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+ curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+ curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+ }
+
+cleanup:
+ PQfinish(conn);

IMO the goto/label coding is not warranted here - a simple if/else can
do the same thing.

~~~

3. free_db_and_rel_infos, free_logical_slot_infos

static void
free_db_and_rel_infos(DbInfoArr *db_arr)
{
int dbnum;

for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
}
pg_free(db_arr->dbs);
db_arr->dbs = NULL;
db_arr->ndbs = 0;
}

In v12 now you removed the free_logical_slot_infos(). But isn't it
better to still call free_logical_slot_infos() from the above
free_db_and_rel_infos() still so the slot memory
(dbinfo->slot_arr.slots) won't stay lying around?

~~~

4. get_logical_slot_infos, print_slot_infos

In another thread [1]pg_upgrade logs - /messages/by-id/CAHut+PuOB4bUwkYAjA_NkTrYaocKy6W3ZYK5Pin305R7mNSLgA@mail.gmail.com I am posting some minor patch changes to the
VERBOSE logging (changes to double-quotes and commas etc.). Please
keep a watch on that thread because if gets pushed then this one will
be impacted. e.g. your logging here ought also to include the same
suggested double quotes.

------
[1]: pg_upgrade logs - /messages/by-id/CAHut+PuOB4bUwkYAjA_NkTrYaocKy6W3ZYK5Pin305R7mNSLgA@mail.gmail.com
/messages/by-id/CAHut+PuOB4bUwkYAjA_NkTrYaocKy6W3ZYK5Pin305R7mNSLgA@mail.gmail.com
[2]: Kuroda-san reply to my v11 review - /messages/by-id/TYAPR01MB5866BD618DEE62AF1836E612F5749@TYAPR01MB5866.jpnprd01.prod.outlook.com
/messages/by-id/TYAPR01MB5866BD618DEE62AF1836E612F5749@TYAPR01MB5866.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#48

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#47)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version.

1. check_new_cluster
+ if (user_opts.include_logical_slots)
+ {
+ get_logical_slot_infos(&new_cluster);
+ check_for_parameter_settings(&new_cluster);
+ }
+
check_new_cluster_is_empty();
~
The code is OK, but maybe your reply/explanation (see [2] #2) saying
get_logical_slot_infos() needs to be called before
check_new_cluster_is_empty() would be good to have in a comment here?

Indeed, added.

src/bin/pg_upgrade/info.c

2. get_logical_slot_infos

+ if (ntups)
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+ else
+ {
+ slotinfos = NULL;
+ goto cleanup;
+ }
+
+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+ curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+ curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+ curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+ }
+
+cleanup:
+ PQfinish(conn);

IMO the goto/label coding is not warranted here - a simple if/else can
do the same thing.

Yeah, I could simplify by if-statement. Additionally, some definitions of variables
are moved to the code block.

3. free_db_and_rel_infos, free_logical_slot_infos

static void
free_db_and_rel_infos(DbInfoArr *db_arr)
{
int dbnum;

for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
}
pg_free(db_arr->dbs);
db_arr->dbs = NULL;
db_arr->ndbs = 0;
}

~

In v12 now you removed the free_logical_slot_infos(). But isn't it
better to still call free_logical_slot_infos() from the above
free_db_and_rel_infos() still so the slot memory
(dbinfo->slot_arr.slots) won't stay lying around?

The free_db_and_rel_infos() is called at restore phase, and slot_arr has malloc'd
members only when logical slots are defined on new_cluster. In this case the FATAL
error is occured in the checking phase, so there is no possibility to reach restore
phase.

4. get_logical_slot_infos, print_slot_infos

In another thread [1] I am posting some minor patch changes to the
VERBOSE logging (changes to double-quotes and commas etc.). Please
keep a watch on that thread because if gets pushed then this one will
be impacted. e.g. your logging here ought also to include the same
suggested double quotes.

I thought it would be pushed soon, so the suggestion was included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v13-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v13-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 45230a4af7c2819be44b1227ef19f425a075d6a7 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v13 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 155 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  67 ++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 112 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 115 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 605 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..0ff503a736 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,10 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not listed.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10291,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18270,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18532,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..f9cba8548e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional works if --include-logical-replication-slots is required.
+	 * These must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in the function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		get_logical_slot_infos(&new_cluster);
+		check_for_parameter_settings(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +380,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1434,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..79b4738cb8 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,11 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +601,95 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			ntups;
+	int			num_slots = 0;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	ntups = PQntuples(res);
+
+	if (ntups)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < ntups; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[num_slots++];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +700,14 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr, but
+		 * there is no need to free it. It has a valid member only when the
+		 * cluster had logical replication slots in the previous call. However,
+		 * in this case, a FATAL error is thrown, and we cannot reach this
+		 * point.
+		 */
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +758,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..525a7704cf
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,115 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v13-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v13-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 582ae94c099a3d33fe51f0fb32eff89c5b273ed0 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v13 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bc5a8e0569..78b4528f2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v13-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v13-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From db7658c30223a2fe131d76c40f8b70011a650760 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v13 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index f9cba8548e..61ca36a853 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +111,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1446,6 +1451,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1469,3 +1478,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 525a7704cf..21fefca084 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

v13-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v13-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From 6c86780aadf28d67e2571095ffd9ef2b3e3370c5 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v13 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 61ca36a853..528576b00e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -112,7 +113,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1479,6 +1492,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 21fefca084..d7fd864bd7 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,15 +80,71 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

#49

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#48)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

I looked at the latest patch v13-0001. Here are some minor comments.

======
src/bin/pg_upgrade/info.c

1. get_logical_slot_infos_per_db

I noticed that the way this is coded, 'ntups' and 'num_slots' seems to
have exactly the same meaning. IMO you can simplify this by removing
'ntups'.

BEFORE
+ int ntups;
+ int num_slots = 0;

SUGGESTION
+ int num_slots;

BEFORE
+ ntups = PQntuples(res);
+
+ if (ntups)
+ {

SUGGESTION
+ num_slots = PQntuples(res);
+
+ if (num_slots)
+ {

BEFORE
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);

SUGGESTION
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
num_slots);

BEFORE
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[num_slots++];

SUGGESTION
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];

======

2. get_logical_slot_infos, print_slot_infos

In another thread [1] I am posting some minor patch changes to the
VERBOSE logging (changes to double-quotes and commas etc.). Please
keep a watch on that thread because if gets pushed then this one will
be impacted. e.g. your logging here ought also to include the same
suggested double quotes.

I thought it would be pushed soon, so the suggestion was included.

OK, but I think you have accidentally missed adding similar new double
quotes to all other VERBOSE logging in your patch.

e.g. see get_logical_slot_infos:
pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#50

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#49)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! PSA new version patchset.

1. get_logical_slot_infos_per_db

I noticed that the way this is coded, 'ntups' and 'num_slots' seems to
have exactly the same meaning. IMO you can simplify this by removing
'ntups'.
BEFORE
+ int ntups;
+ int num_slots = 0;
SUGGESTION
+ int num_slots;

~
BEFORE
+ ntups = PQntuples(res);
+
+ if (ntups)
+ {
SUGGESTION
+ num_slots = PQntuples(res);
+
+ if (num_slots)
+ {
~

BEFORE
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);

SUGGESTION
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
num_slots);

~
BEFORE
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[num_slots++];
SUGGESTION
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];

Right, fixed.

2. get_logical_slot_infos, print_slot_infos

In another thread [1] I am posting some minor patch changes to the
VERBOSE logging (changes to double-quotes and commas etc.). Please
keep a watch on that thread because if gets pushed then this one will
be impacted. e.g. your logging here ought also to include the same
suggested double quotes.

I thought it would be pushed soon, so the suggestion was included.

OK, but I think you have accidentally missed adding similar new double
quotes to all other VERBOSE logging in your patch.

e.g. see get_logical_slot_infos:
pg_log(PG_VER

BOSE, "Database: %s", pDbInfo->db_name);

Oh, I missed it. Fixed. I grepped patches and could not find other lines
which should be double-quoted.

In addition, I ran pgindent again for 0001.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v14-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v14-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From bd94104de8a846ce8762b79fbdb4829691c57475 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v14 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 61ca36a853..528576b00e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -112,7 +113,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1479,6 +1492,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 21fefca084..d7fd864bd7 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,15 +80,71 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

v14-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v14-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 01fe267cfdf81c68d01905c1cb556f9897e76017 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v14 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 156 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  17 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  67 ++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 115 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 605 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..880ef3bd1d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,11 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not listed.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10252,6 +10292,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18227,6 +18271,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18488,3 +18533,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..de081c35ae 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,20 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fea159689e..f9cba8548e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional works if --include-logical-replication-slots is required.
+	 * These must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in the function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		get_logical_slot_infos(&new_cluster);
+		check_for_parameter_settings(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +380,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1434,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 85ed15ae4a..139c4b3cde 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,11 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +601,94 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +699,14 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
+		 * but there is no need to free it. It has a valid member only when
+		 * the cluster had logical replication slots in the previous call.
+		 * However, in this case, a FATAL error is thrown, and we cannot reach
+		 * this point.
+		 */
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 75bab0a04c..373a9ef490 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..525a7704cf
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,115 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4058b88c3..5944cb34ea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMode
 LogicalRepMsgType
 LogicalRepPartMapEntry
@@ -1492,6 +1493,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v14-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v14-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From add3123cdab69dae07b2bd81645ee98c75acabd2 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v14 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bc5a8e0569..78b4528f2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..aeea6ffd1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v14-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v14-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From cd3e33315d823b1202fda003622dbaac1b8ad9fa Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v14 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index f9cba8548e..61ca36a853 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +111,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
@@ -1446,6 +1451,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1469,3 +1478,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version < 1600))
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 525a7704cf..21fefca084 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

#51

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#50)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san,

I confirmed the patch changes from v13-0001 to v14-0001 have addressed
the comments from my previous post, and the cfbot is passing OK, so I
don't have any more review comments at this time.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#52

Wei Wang (Fujitsu)

wangw.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#50)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Tues, May 16, 2023 at 14:15 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

Thanks for reviewing! PSA new version patchset.

Thanks for updating the patch set.

Here are some comments:
===
For patches 0001

1. The latest patch set fails to apply because the new commit (0245f8d) in HEAD.

~~~

2. In file pg_dump.h.
```
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
```

Do we need the structure member "slottype"? It seems we do not use "slottype"
because we only dump logical replication slot.

===
For patch 0002

3. In the function SaveSlotToPath
```
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
```
It seems that the code isn't consistent with our expectation.
If this is called for a physical slot during a shutdown checkpoint and there's
nothing to write, I think it will also persist physical slots to disk.

===
For patch 0003

4. In the function check_for_parameter_settings
```
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
```
It seems that there is a slight mistake (the input of GET_MAJOR_VERSION) in the
if-condition:
GET_MAJOR_VERSION(new_cluster->major_version < 1600)
->
GET_MAJOR_VERSION(new_cluster->major_version) <= 1500

Please also check the similar if-conditions in the below two functions
check_for_confirmed_flush_lsn (in 0003 patch)
check_are_logical_slots_active (in 0004 patch)

Regards,
Wang wei

#53

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Wei Wang (Fujitsu) (#52)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Wang,

Thank you for reviewing! PSA new version.

For patches 0001

1. The latest patch set fails to apply because the new commit (0245f8d) in HEAD.

I didn't notice that. Thanks, fixed.

2. In file pg_dump.h.
```
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	char	   *slottype;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
```

Do we need the structure member "slottype"? It seems we do not use "slottype"
because we only dump logical replication slot.

As you said, this attribute is not needed. This is a garbage of previous efforts.
Removed.

For patch 0002

3. In the function SaveSlotToPath
```
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
```
It seems that the code isn't consistent with our expectation.
If this is called for a physical slot during a shutdown checkpoint and there's
nothing to write, I think it will also persist physical slots to disk.

You meant to say that we should not change handlings for physical case, right?

For patch 0003
4. In the function check_for_parameter_settings
```
+	/* --include-logical-replication-slots can be used since PG	16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+		return;
```
It seems that there is a slight mistake (the input of GET_MAJOR_VERSION) in the
if-condition:
GET_MAJOR_VERSION(new_cluster->major_version < 1600)
->
GET_MAJOR_VERSION(new_cluster->major_version) <= 1500
Please also check the similar if-conditions in the below two functions
check_for_confirmed_flush_lsn (in 0003 patch)
check_are_logical_slots_active (in 0004 patch)

Done. I grepped with GET_MAJOR_VERSION, and confirmed they were fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v15-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v15-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 6979c1bb7454a495fdea6ae99a99df6500e88c81 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v15 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 156 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  16 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  67 ++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 115 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 604 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3af97a6039..3c6d2462d4 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,11 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not listed.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10235,6 +10275,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18215,6 +18259,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18476,3 +18521,114 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 *
+	 * XXX: Do we have to support physical slots?
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index ed6ce41ad7..0582cd84b1 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -666,6 +667,19 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 745578d855..4e12e46dc5 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1498,6 +1500,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..f08fcc7705 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional works if --include-logical-replication-slots is required.
+	 * These must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in the function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		get_logical_slot_infos(&new_cluster);
+		check_for_parameter_settings(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +380,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1434,38 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..a605056bbe 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -23,10 +23,11 @@ static void free_db_and_rel_infos(DbInfoArr *db_arr);
 static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
+static void get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +601,94 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +699,14 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
+		 * but there is no need to free it. It has a valid member only when
+		 * the cluster had logical replication slots in the previous call.
+		 * However, in this case, a FATAL error is thrown, and we cannot reach
+		 * this point.
+		 */
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..49e16234ed 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..525a7704cf
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,115 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+my $result = $old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+is($result, qq(12), 'ensure WALs are not consumed yet');
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+$result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 260854747b..965219524d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1487,6 +1487,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalReplicationSlotInfo
 LogicalRepMsgType
 LogicalRepPartMapEntry
 LogicalRepPreparedTxnData
@@ -1499,6 +1500,8 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v15-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v15-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 874ceca95db1e9d0818ac39c2169bacf9a3fe975 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v15 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b2430f617c..384245c157 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..526323e87b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v15-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v15-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 08e2bd69b941c9bedfe9f5a927d7339a15ad3e12 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v15 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

Author: Hayato Kuroda
Reviewed-by: Wang Wei
---
 src/bin/pg_upgrade/check.c                    | 80 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 30 ++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index f08fcc7705..391144570f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,7 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -31,7 +34,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +111,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1446,6 +1451,10 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	int			max_replication_slots;
 	char	   *wal_level;
 
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
+		return;
+
 	prep_status("Checking for logical replication slots");
 
 	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
@@ -1469,3 +1478,72 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 525a7704cf..21fefca084 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);
 
 my $result = $old_node->safe_psql('postgres',
-	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
+
 is($result, qq(12), 'ensure WALs are not consumed yet');
 $old_node->stop;
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
+$old_node->safe_psql('postgres',
+	"SELECT count (*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
-- 
2.27.0

v15-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v15-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From 0258212ea524a5e0250939407f705121e0597a10 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v15 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during thepublisher's
shutdown.

Author: Hayato Kuroda
Reviewed-by: Wang Wei
---
 src/bin/pg_upgrade/check.c                    | 68 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 +++++++++++++++++-
 2 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 391144570f..b3da2e9193 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -35,6 +35,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -112,7 +113,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1479,6 +1492,59 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		char	   *slotname;
+
+		is_error = true;
+
+		slotname = PQgetvalue(res, i, i_slotname);
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   slotname);
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 21fefca084..d7fd864bd7 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,15 +80,71 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
-my $result = $old_node->safe_psql('postgres',
+$result = $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
 );
 
-- 
2.27.0

#54

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#53)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, 22 May 2023 at 15:50, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Wang,

Thank you for reviewing! PSA new version.

Thanks for the updated patch, Few comments:
Few comments
1) check_for_parameter_settings, check_for_confirmed_flush_lsn and
check_are_logical_slots_active functions all have the same messages,
we can keep it unique so that it is easy for user to interpret:
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+       PGresult   *res;
+       PGconn     *conn = connectToServer(new_cluster, "template1");
+       int                     max_replication_slots;
+       char       *wal_level;
+
+       prep_status("Checking for logical replication slots");
+
+       res = executeQueryOrDie(conn, "SHOW max_replication_slots;");

+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+       int                     i,
+                               ntups,
+                               i_slotname;
+       bool            is_error = false;
+       PGresult   *res;
+       DbInfo     *active_db = &cluster->dbarr.dbs[0];
+       PGconn     *conn = connectToServer(cluster, active_db->db_name);
+
+       Assert(user_opts.include_logical_slots);
+
+       /* --include-logical-replication-slots can be used since PG16. */
+       if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+               return;
+
+       prep_status("Checking for logical replication slots");

+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+       int                     i,
+                               ntups,
+                               i_slotname;
+       bool            is_error = false;
+       PGresult   *res;
+       DbInfo     *active_db = &cluster->dbarr.dbs[0];
+       PGconn     *conn = connectToServer(cluster, active_db->db_name);
+
+       Assert(user_opts.include_logical_slots);
+
+       /* --include-logical-replication-slots can be used since PG16. */
+       if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+               return;
+
+       prep_status("Checking for logical replication slots");

2) This function can be placed above get_logical_slot_infos and the
prototype from this file can be removed:
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of
the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+       PGconn     *conn = connectToServer(cluster,
+
    dbinfo->db_name);

3) LogicalReplicationSlotInfo should be placed after LogicalRepWorker
to keep the order consistent:
LogicalRepCommitPreparedTxnData
LogicalRepCtxStruct
+LogicalReplicationSlotInfo
LogicalRepMsgType

4) "existence of slots" be changed to "existence of slots."
+               /*
+                * If --include-logical-replication-slots is required, check the
+                * existence of slots
+                */

5) This comment can be removed:
+ *
+ * XXX: add more attributes if needed
+ */

Regards,
Vignesh

#55

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#53)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, 22 May 2023 at 15:50, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Wang,

Thank you for reviewing! PSA new version.

For patches 0001

1. The latest patch set fails to apply because the new commit (0245f8d) in HEAD.

I didn't notice that. Thanks, fixed.
2. In file pg_dump.h.
```
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ *
+ * XXX: add more attributes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+     DumpableObject dobj;
+     char       *plugin;
+     char       *slottype;
+     bool            twophase;
+} LogicalReplicationSlotInfo;
```
Do we need the structure member "slottype"? It seems we do not use "slottype"
because we only dump logical replication slot.
As you said, this attribute is not needed. This is a garbage of previous efforts.
Removed.
For patch 0002
3. In the function SaveSlotToPath
```
-     /* and don't do anything if there's nothing to write */
-     if (!was_dirty)
+     /*
+      * and don't do anything if there's nothing to write, unless it's this is
+      * called for a logical slot during a shutdown checkpoint, as we want to
+      * persist the confirmed_flush_lsn in that case, even if that's the only
+      * modification.
+      */
+     if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
```
It seems that the code isn't consistent with our expectation.
If this is called for a physical slot during a shutdown checkpoint and there's
nothing to write, I think it will also persist physical slots to disk.
You meant to say that we should not change handlings for physical case, right?
For patch 0003
4. In the function check_for_parameter_settings
```
+     /* --include-logical-replication-slots can be used since PG     16. */
+     if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
+             return;
```
It seems that there is a slight mistake (the input of GET_MAJOR_VERSION) in the
if-condition:
GET_MAJOR_VERSION(new_cluster->major_version < 1600)
->
GET_MAJOR_VERSION(new_cluster->major_version) <= 1500
Please also check the similar if-conditions in the below two functions
check_for_confirmed_flush_lsn (in 0003 patch)
check_are_logical_slots_active (in 0004 patch)
Done. I grepped with GET_MAJOR_VERSION, and confirmed they were fixed.

Few minor comments:
1) we could remove the variable slotname from the below code by using
PQgetvalue directly in pg_log:
+       for (i = 0; i < ntups; i++)
+       {
+               char       *slotname;
+
+               is_error = true;
+
+               slotname = PQgetvalue(res, i, i_slotname);
+
+               pg_log(PG_WARNING,
+                          "\nWARNING: logical replication slot \"%s\"
is not active",
+                          slotname);
+       }

2) This include "catalog/pg_control.h" should be after inclusion pg_collation.h
#include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
#include "catalog/pg_collation.h"

3) This spurious addition line change might not be required in this patch:
 --- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
 ]);

my $result = $old_node->safe_psql('postgres',
- "SELECT count (*) FROM
pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+ "SELECT count(*) FROM
pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
);
+
is($result, qq(12), 'ensure WALs are not consumed yet');
$old_node->stop;

4) This inclusion "#include "access/xlogrecord.h" is not required:
#include "postgres_fe.h"

+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"

5)"thepublisher's" should be "the publisher's"
When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the
confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during
thepublisher's
shutdown.

Regards,
Vignesh

#56

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#54)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for reviewing! New version will be attached the next post.

Few comments
1) check_for_parameter_settings, check_for_confirmed_flush_lsn and
check_are_logical_slots_active functions all have the same messages,
we can keep it unique so that it is easy for user to interpret:
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+       PGresult   *res;
+       PGconn     *conn = connectToServer(new_cluster, "template1");
+       int                     max_replication_slots;
+       char       *wal_level;
+
+       prep_status("Checking for logical replication slots");
+
+       res = executeQueryOrDie(conn, "SHOW max_replication_slots;");

+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+       int                     i,
+                               ntups,
+                               i_slotname;
+       bool            is_error = false;
+       PGresult   *res;
+       DbInfo     *active_db = &cluster->dbarr.dbs[0];
+       PGconn     *conn = connectToServer(cluster, active_db->db_name);
+
+       Assert(user_opts.include_logical_slots);
+
+       /* --include-logical-replication-slots can be used since PG16. */
+       if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+               return;
+
+       prep_status("Checking for logical replication slots");

+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+       int                     i,
+                               ntups,
+                               i_slotname;
+       bool            is_error = false;
+       PGresult   *res;
+       DbInfo     *active_db = &cluster->dbarr.dbs[0];
+       PGconn     *conn = connectToServer(cluster, active_db->db_name);
+
+       Assert(user_opts.include_logical_slots);
+
+       /* --include-logical-replication-slots can be used since PG16. */
+       if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+               return;
+
+       prep_status("Checking for logical replication slots");

Changed. How do you think?

2) This function can be placed above get_logical_slot_infos and the
prototype from this file can be removed:
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of
the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+       PGconn     *conn = connectToServer(cluster,
+
dbinfo->db_name);

Removed.

3) LogicalReplicationSlotInfo should be placed after LogicalRepWorker
to keep the order consistent:
LogicalRepCommitPreparedTxnData
LogicalRepCtxStruct
+LogicalReplicationSlotInfo
LogicalRepMsgType

Indeed, fixed.

4) "existence of slots" be changed to "existence of slots."
+               /*
+                * If --include-logical-replication-slots is required, check the
+                * existence of slots
+                */

The comma was added.

5) This comment can be removed:
+ *
+ * XXX: add more attributes if needed
+ */

Removed. Additionally, another XXX which mentioned about physical slots was also removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#57

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#55)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for reviewing! PSA new version patch set.

Few minor comments:
1) we could remove the variable slotname from the below code by using
PQgetvalue directly in pg_log:
+       for (i = 0; i < ntups; i++)
+       {
+               char       *slotname;
+
+               is_error = true;
+
+               slotname = PQgetvalue(res, i, i_slotname);
+
+               pg_log(PG_WARNING,
+                          "\nWARNING: logical replication slot \"%s\"
is not active",
+                          slotname);
+       }

Removed. Such codes were in two functions, and both of them were fixed.

2) This include "catalog/pg_control.h" should be after inclusion pg_collation.h
#include "catalog/pg_authid_d.h"
+#include "catalog/pg_control.h"
#include "catalog/pg_collation.h"

Moved.

3) This spurious addition line change might not be required in this patch:
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -85,11 +85,39 @@ $old_node->safe_psql(
]);
my $result = $old_node->safe_psql('postgres',
- "SELECT count (*) FROM
pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+ "SELECT count(*) FROM
pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
);
+
is($result, qq(12), 'ensure WALs are not consumed yet');
$old_node->stop;

I removed the line.
In the first place, what I wanted to check here was that pg_upgrade failed because
WALs were not consumed. So if pg_logical_slot_get_changes() was called here, all
of WALs were consumed here and the subsequent command was sucseeded. This was not
happy for us and that's why changed to pg_logical_slot_peek_changes().
But after considering more, I thought that calling the function was not the mandatory
because no one needed the output.So removed.

4) This inclusion "#include "access/xlogrecord.h" is not required:
#include "postgres_fe.h"
+#include "access/xlogrecord.h"
+#include "access/xlog_internal.h"
#include "catalog/pg_authid_d.h"

Removed.

5)"thepublisher's" should be "the publisher's"
When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the
confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during
thepublisher's
shutdown.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v16-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v16-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 9c80fc25dc9d922ef5d793457c7b84d4b5713e60 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v16 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 154 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  71 ++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 110 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 114 +++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 602 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..898ce42005 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,11 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not listed.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10277,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18262,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18524,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 160000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..e16ba70a54 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_parameter_settings(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional works if --include-logical-replication-slots is required.
+	 * These must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in the function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		get_logical_slot_infos(&new_cluster);
+		check_for_parameter_settings(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +380,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 slot_arr->slots[0].slotname);
+		}
+
 	}
 }
 
@@ -1402,3 +1434,42 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..0b5d5d870b 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,7 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 /*
  * gen_db_file_maps()
@@ -600,6 +600,94 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	query[0] = '\0';			/* initialize query string to empty */
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +698,14 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
+		 * but there is no need to free it. It has a valid member only when
+		 * the cluster had logical replication slots in the previous call.
+		 * However, in this case, a FATAL error is thrown, and we cannot reach
+		 * this point.
+		 */
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +756,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..49e16234ed 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because the
+	 * command will remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7adbb50807 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	LogicalSlotInfo *slots;
+	int			nslots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..b0f8efec78
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,114 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value to succeed the pg_upgrade
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
+);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 260854747b..ba0a933767 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1498,7 +1498,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v16-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v16-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 2f4486024dad55ab094f9ab2d1e5812e939f3218 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v16 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b2430f617c..384245c157 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7011,7 +7011,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8021aaa0a8..526323e87b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v16-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v16-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 77aef40053ccd2e2d7451ac9d819e11006e9e84b Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v16 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    | 71 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 24 +++++++
 2 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e16ba70a54..aaa348037f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,8 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_control.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
@@ -31,7 +33,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
-
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -108,6 +110,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1473,3 +1477,68 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index b0f8efec78..a2bb0b5e35 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -84,6 +84,30 @@ $old_node->safe_psql(
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+$old_node->start;
 $old_node->safe_psql('postgres',
 	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)"
 );
-- 
2.27.0

v16-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v16-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From 919b2933bf6559a6cdfda835c52442fa0e23462b Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v16 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during the publisher's
shutdown.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    | 64 ++++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 64 ++++++++++++++++++-
 2 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index aaa348037f..1e2ffdb60d 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -34,6 +34,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_parameter_settings(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 /*
  * fix_path_separator
@@ -111,7 +112,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1478,6 +1491,55 @@ check_for_parameter_settings(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking the status of subscriptions for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index a2bb0b5e35..a46c838b60 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,12 +80,68 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value to succeed the pg_upgrade
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
 $old_node->stop;
-- 
2.27.0

#58

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#21)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Apr 14, 2023 at 4:00 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

Here, why the behavior is different for manual and non-manual checkpoint?

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

Can you please explain what led to updating the confirmed_flush in
memory but not in the disk? BTW, have we ensured that discarding the
additional records are already sent to the subscriber, if so, why for
those records confirmed_flush LSN is not progressed?

--
With Regards,
Amit Kapila.

#59

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#58)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving comments!

Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid

such

extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script

comment),

as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

Here, why the behavior is different for manual and non-manual checkpoint?

I have analyzed more, and concluded that there are no difference between manual
and shutdown checkpoint.

The difference was whether the CHECKPOINT record has been decoded or not.
The overall workflow of this test was:

1. do INSERT
(2. do CHECKPOINT)
(3. decode CHECKPOINT record)
4. receive feedback message from standby
5. do shutdown CHECKPOINT

At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was:
standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().

At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave()
was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and
3 are misssed, the dirty flag is not set and the change is still on the memory.

FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and
the patch from Julien is not applied, the updated value will be discarded. This
is what I observed. The patch forces to save the logical slot at the shutdown
checkpoint, so the confirmed_lsn is save to disk at step 5.

Can you please explain what led to updating the confirmed_flush in
memory but not in the disk?

The code-level workflow was said above. The slot info is updated only after
decoding CHECKPOINT. I'm not sure the initial motivation, but I suspect we wanted
to reduce the number of writing to disk.

BTW, have we ensured that discarding the
additional records are already sent to the subscriber, if so, why for
those records confirmed_flush LSN is not progressed?

In this case, the apply worker request the LSN which is greater than confirmed_lsn
via START_REPLICATION. Therefore, according to CreateDecodingContext(), the
walsender sends from the appropriate records, doesn't it? I think discarding is
not happened on subscriber.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#60

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#57)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Jun 8, 2023 at 9:24 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Few comments/questions
====================
1.
+check_for_parameter_settings(ClusterInfo *new_cluster)
{
...
+
+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
...
}

Won't it be better to verify that the value of "max_replication_slots"
is greater than the number of logical slots we are planning to copy
from old on the new cluster? Similar to this, I thought whether we
need to check the value of max_wal_senders? But, I guess one can
simply decode from slots by using APIs, so not sure about that. What
do you think?

2.
+ /*
+ * Dump logical replication slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.
+ */
+ if (user_opts.include_logical_slots)

Can you explain this point a bit more with some example scenarios?
Basically, if we had sent all the WAL before the upgrade then why do
we need to worry about the timing of pg_resetwal?

3. I see that you are trying to ensure that all the WAL has been
consumed for a slot except for shutdown_checkpoint in patch 0003 but
do we need to think of any interaction with restart_lsn
(MyReplicationSlot->data.restart_lsn) which is the start point to read
WAL for decoding by walsender?

--
With Regards,
Amit Kapila.

#61

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#57)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san, I haven't looked at this thread for a very long time so
to re-familiarize myself with it I read all the latest v16-0001 patch.

Here are a number of minor review comments I noted in passing:

======
Commit message

1.
For pg_dump this commit includes a new option called
"--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

This part doesn't do the actual execution, so maybe slightly reword this.

BEFORE
An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

SUGGESTION
An SQL file that executes pg_create_logical_replication_slot() with
the extracted parameters is generated.

~~~

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

~~~

Maybe "restores from the dump" can be described more?

BEFORE
...and restores from the dump.

SUGGESTION
...and restores the slots using the
pg_create_logical_replication_slots() statements that the dump
generated (see above).

======
src/bin/pg_dump/pg_dump.c

3. help

+
+ /*
+ * The option --logical-replication-slots-only is used only by pg_upgrade
+ * and should not be called by users, which is why it is not listed.
+ */
  printf(_("  --no-comments                do not dump comments\n"));
~

/not listed./not exposed by the help./

~~~

4. getLogicalReplicationSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000)
+ return;

PG16 is already in beta. I think this should now be changed to 170000, right?

======
src/bin/pg_upgrade/check.c

5. check_new_cluster

+ /*
+ * Do additional works if --include-logical-replication-slots is required.
+ * These must be done before check_new_cluster_is_empty() because the
+ * slot_arr attribute of the new_cluster will be checked in the function.
+ */

SUGGESTION (minor rewording/grammar)
Do additional work if --include-logical-replication-slots was
specified. This must be done before check_new_cluster_is_empty()
because the slot_arr attribute of the new_cluster will be checked in
that function.

~~~

6. check_new_cluster_is_empty

+ /*
+ * If --include-logical-replication-slots is required, check the
+ * existence of slots.
+ */
+ if (user_opts.include_logical_slots)
+ {
+ LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+ /* if nslots > 0, report just first entry and exit */
+ if (slot_arr->nslots)
+ pg_fatal("New cluster database \"%s\" is not empty: found logical
replication slot \"%s\"",
+ new_cluster.dbarr.dbs[dbnum].db_name,
+ slot_arr->slots[0].slotname);
+ }
+

6a.
There are a number of places in this function using
"new_cluster.dbarr.dbs[dbnum].XXX"

It is OK but maybe it would be tidier to up-front assign a local
variable for this?

DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];

6b.
The above code adds an unnecessary blank line in the loop that was not
there previously.

~~~

7. check_for_parameter_settings

+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)

7a.
I felt this might have some missing words so it was meant to say:

SUGGESTION
Verify the parameter settings necessary for creating logical replication slots.

7b.
Maybe you can give this function a better name because there is no
hint in this generic name that it has anything to do with replication
slots.

~~~

8.
+ /* --include-logical-replication-slots can be used since PG16. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
+ return;

PG16 is already in beta, so the version number (1500) and the comment
mentioning PG16 are outdated aren't they?

======
src/bin/pg_upgrade/info.c

9.
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);

The removal of the existing blank line seems not a necessary part of this patch.

~~~

10. get_logical_slot_infos_per_db

+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query, sizeof(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

Does the initial assignment query[0] = '\0'; acheive anything? IIUC,
the next statement is simply going to overwrite that anyway.

~~~

11. free_db_and_rel_infos

+
+ /*
+ * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
+ * but there is no need to free it. It has a valid member only when
+ * the cluster had logical replication slots in the previous call.
+ * However, in this case, a FATAL error is thrown, and we cannot reach
+ * this point.
+ */

Maybe this comment can be reworded? For example, the meaning of "in
the previous call" is not very clear. What previous call?

======
src/bin/pg_upgrade/pg_upgrade.c

12. main

+ /*
+ * Create logical replication slots if requested.
+ *
+ * Note: This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_logical_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }

IMO "the command" is a bit vague. It might be better to be explicit
and say "... because pg_resetwal would remove XXXXX..."

======
src/bin/pg_upgrade/pg_upgrade.h

13.
+typedef struct
+{
+ LogicalSlotInfo *slots;
+ int nslots;
+} LogicalSlotInfoArr;
+

I assume you mimicked the RelInfoArr struct, but IMO it makes more
sense for the field 'nslots' to come before the 'slots'.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#62

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#59)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

I have analyzed more, and concluded that there are no difference between manual
and shutdown checkpoint.

The difference was whether the CHECKPOINT record has been decoded or not.
The overall workflow of this test was:

1. do INSERT
(2. do CHECKPOINT)
(3. decode CHECKPOINT record)
4. receive feedback message from standby
5. do shutdown CHECKPOINT

At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was:
standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().

At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave()
was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and
3 are misssed, the dirty flag is not set and the change is still on the memory.

FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and
the patch from Julien is not applied, the updated value will be discarded. This
is what I observed. The patch forces to save the logical slot at the shutdown
checkpoint, so the confirmed_lsn is save to disk at step 5.

I see your point but there are comments in walsender.c which indicates
that we also wait for step-5 to get replicated. See [1]/* * When SIGUSR2 arrives, we send any outstanding logs up to the * shutdown checkpoint record (i.e., the latest record), wait for * them to be replicated to the standby, and exit. ... */ and comments
atop walsender.c. If this is true then we don't need a special check
as you have in patch 0003 or at least it doesn't seem to be required
in all cases.

[1]: /* * When SIGUSR2 arrives, we send any outstanding logs up to the * shutdown checkpoint record (i.e., the latest record), wait for * them to be replicated to the standby, and exit. ... */
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
* them to be replicated to the standby, and exit. ...
*/

--
With Regards,
Amit Kapila.

#63

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Amit Kapila (#62)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Jul 17, 2023 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

I have analyzed more, and concluded that there are no difference between manual
and shutdown checkpoint.

The difference was whether the CHECKPOINT record has been decoded or not.
The overall workflow of this test was:

1. do INSERT
(2. do CHECKPOINT)
(3. decode CHECKPOINT record)
4. receive feedback message from standby
5. do shutdown CHECKPOINT

At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was:
standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().

At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave()
was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and
3 are misssed, the dirty flag is not set and the change is still on the memory.

FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and
the patch from Julien is not applied, the updated value will be discarded. This
is what I observed. The patch forces to save the logical slot at the shutdown
checkpoint, so the confirmed_lsn is save to disk at step 5.

I see your point but there are comments in walsender.c which indicates
that we also wait for step-5 to get replicated. See [1] and comments
atop walsender.c. If this is true then we don't need a special check
as you have in patch 0003 or at least it doesn't seem to be required
in all cases.

I have studied this a bit more and it seems that is true for physical
walsenders where we set the state of walsender as WALSNDSTATE_STOPPING
in XLogSendPhysical, then the checkpointer finishes writing checkpoint
record and then postmaster sends SIGUSR2 for walsender to exit. IIUC,
this whole logic of different stop states has been introduced in
commit c6c3334364 based on the discussion in the thread [1]/messages/by-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk=Ljsk-UL9cEf-eA@mail.gmail.com. As per my
understanding, logical walsenders don't seem to be waiting for
shutdown checkpoint record and finishes before even we LOG that
record. It seems that the behavior of logical walsenders is different
from physical walsenders where we wait for them to send even the final
shutdown checkpoint record before they finish. If so, then we won't be
able to switchover to logical subscribers even in case of a clean
shutdown. Am, I missing something?

[1]: /messages/by-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk=Ljsk-UL9cEf-eA@mail.gmail.com

--
With Regards,
Amit Kapila.

#64

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#63)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I have studied this a bit more and it seems that is true for physical
walsenders where we set the state of walsender as WALSNDSTATE_STOPPING
in XLogSendPhysical, then the checkpointer finishes writing checkpoint
record and then postmaster sends SIGUSR2 for walsender to exit. IIUC,
this whole logic of different stop states has been introduced in
commit c6c3334364 based on the discussion in the thread [1]. As per my
understanding, logical walsenders don't seem to be waiting for
shutdown checkpoint record and finishes before even we LOG that
record. It seems that the behavior of logical walsenders is different
from physical walsenders where we wait for them to send even the final
shutdown checkpoint record before they finish.

Yes, you are right. Physical walsenders wait exiting checkpointer, but logical
ones exit before checkpointer does. This is because logical walsender may generate
WALs due by executing replication commands like START_REPLICATION and
CREATE_REPLICATION_SLOT and they may be recorded at after the shutdown
checkpoint record. This leads PANIC.

If so, then we won't be
able to switchover to logical subscribers even in case of a clean
shutdown. Am, I missing something?

[1] -
/messages/by-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zV
mYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com

Based on the above, we are considering that we delay the timing of shutdown for
logical walsenders. The preliminary workflow is:

1. When logical walsenders receives siginal from checkpointer, it consumes all
of WAL records, change its state into WALSNDSTATE_STOPPING, and stop doing
anything.
2. Then the checkpointer does the shutdown checkpoint
3. After that postmaster sends signal to walsenders, same as current implementation.
4. Finally logical walsenders process the shutdown checkpoint record and update the
confirmed_lsn after the acknowledgement from subscriber.
Note that logical walsenders don't have to send a shutdown checkpoint record
to subscriber but following keep_alive will help us to increment the confirmed_lsn.
5. All tasks are done, they exit.

This mechanism ensures that the confirmed_lsn of active slots is same as the current
WAL location of old publisher, so that 0003 patch would become more simpler.
We would not have to calculate the acceptable difference anymore.

One thing we must consider is that any WALs must not be generated while decoding
the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
it, so the change may be acceptable. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#65

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#61)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version patchset.

======
Commit message

1.
For pg_dump this commit includes a new option called
"--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

~

This part doesn't do the actual execution, so maybe slightly reword this.

BEFORE
An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

SUGGESTION
An SQL file that executes pg_create_logical_replication_slot() with
the extracted parameters is generated.

Changed.

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are
required
by the slots.

~~~

Maybe "restores from the dump" can be described more?

BEFORE
...and restores from the dump.

SUGGESTION
...and restores the slots using the
pg_create_logical_replication_slots() statements that the dump
generated (see above).

Fixed.

src/bin/pg_dump/pg_dump.c

3. help

+
+ /*
+ * The option --logical-replication-slots-only is used only by pg_upgrade
+ * and should not be called by users, which is why it is not listed.
+ */
printf(_("  --no-comments                do not dump comments\n"));
~

/not listed./not exposed by the help./

Fixed.

4. getLogicalReplicationSlots
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000)
+ return;
PG16 is already in beta. I think this should now be changed to 170000, right?

That's right, fixed.

src/bin/pg_upgrade/check.c

5. check_new_cluster
+ /*
+ * Do additional works if --include-logical-replication-slots is required.
+ * These must be done before check_new_cluster_is_empty() because the
+ * slot_arr attribute of the new_cluster will be checked in the function.
+ */
SUGGESTION (minor rewording/grammar)
Do additional work if --include-logical-replication-slots was
specified. This must be done before check_new_cluster_is_empty()
because the slot_arr attribute of the new_cluster will be checked in
that function.

Fixed.

6. check_new_cluster_is_empty
+ /*
+ * If --include-logical-replication-slots is required, check the
+ * existence of slots.
+ */
+ if (user_opts.include_logical_slots)
+ {
+ LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
+
+ /* if nslots > 0, report just first entry and exit */
+ if (slot_arr->nslots)
+ pg_fatal("New cluster database \"%s\" is not empty: found logical
replication slot \"%s\"",
+ new_cluster.dbarr.dbs[dbnum].db_name,
+ slot_arr->slots[0].slotname);
+ }
+
6a.
There are a number of places in this function using
"new_cluster.dbarr.dbs[dbnum].XXX"

It is OK but maybe it would be tidier to up-front assign a local
variable for this?

DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];

Seems better, fixed.

6b.
The above code adds an unnecessary blank line in the loop that was not
there previously.

Removed.

7. check_for_parameter_settings
+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)
7a.
I felt this might have some missing words so it was meant to say:

SUGGESTION
Verify the parameter settings necessary for creating logical replication slots.

Changed.

7b.
Maybe you can give this function a better name because there is no
hint in this generic name that it has anything to do with replication
slots.

Renamed to check_for_logical_replication_slots(), how do you think?

8.
+ /* --include-logical-replication-slots can be used since PG16. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
+ return;
PG16 is already in beta, so the version number (1500) and the comment
mentioning PG16 are outdated aren't they?

Right, fixed.

src/bin/pg_upgrade/info.c
9.
static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
The removal of the existing blank line seems not a necessary part of this patch.

Added.

10. get_logical_slot_infos_per_db
+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query, sizeof(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
Does the initial assignment query[0] = '\0'; acheive anything? IIUC,
the next statement is simply going to overwrite that anyway.

This was garbage of previous versions. Removed.

11. free_db_and_rel_infos
+
+ /*
+ * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
+ * but there is no need to free it. It has a valid member only when
+ * the cluster had logical replication slots in the previous call.
+ * However, in this case, a FATAL error is thrown, and we cannot reach
+ * this point.
+ */
Maybe this comment can be reworded? For example, the meaning of "in
the previous call" is not very clear. What previous call?

After considering more, I thought it should be more simpler. What I wanted to say
was that the slot_arr.slots did not have malloc'd memory. So I added Assert() for
the confirmation and changed comments. For that purpose pg_malloc0() is also
introduced in get_db_infos(). How do you think?

src/bin/pg_upgrade/pg_upgrade.c

12. main
+ /*
+ * Create logical replication slots if requested.
+ *
+ * Note: This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_logical_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
IMO "the command" is a bit vague. It might be better to be explicit
and say "... because pg_resetwal would remove XXXXX..."

Changed.

src/bin/pg_upgrade/pg_upgrade.h
13.
+typedef struct
+{
+ LogicalSlotInfo *slots;
+ int nslots;
+} LogicalSlotInfoArr;
+
I assume you mimicked the RelInfoArr struct, but IMO it makes more
sense for the field 'nslots' to come before the 'slots'.

Yeah, I followed that, but no strong opinion. Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v17-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v17-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 83846edcc79c6f8549b9f35319ee67b52eefc43d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v17 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file that executes pg_create_logical_replication_slot()
with the extracted parameters is generated.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores the slots
using the pg_create_logical_replication_slots() statements that the dump
generated (see above). Note that we cannot dump replication slots at the same time
as the schema dump because we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be restored
before executing the pg_resetwal command because it will remove WALs that are
required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 155 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  76 +++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 146 +++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 641 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..12d4066d3b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,12 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not exposed by the
+	 * help.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10278,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18263,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18525,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 170000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..4b54f5567c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+static int num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +195,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional work if --include-logical-replication-slots was specified.
+	 * This must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in that function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +381,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+			LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 pDbInfo->db_name,
+						 slot_arr->slots[0].slotname);
+		}
 	}
 }
 
@@ -1402,3 +1435,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* --include-logical-replication-slots can be used since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..8bc0ad2e10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,96 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +701,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..6dd3832422 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..8034067492 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..9ca266f6b2
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,146 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('to_be_dropped');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e941fb6c82..3962322d9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1498,7 +1498,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v17-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v17-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 3000080ab40a37913aa5d45f013377325bad04eb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v17 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8b0710abe6..38dac88247 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7015,7 +7015,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..5aed7cd190 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v17-0003-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v17-0003-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 2fbe4412992e1d6821d8429249e78dceae402452 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v17 3/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    | 70 +++++++++++++++++++
 .../t/003_logical_replication_slots.pl        | 21 ++++++
 2 files changed, 91 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 4b54f5567c..a8888933f6 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,8 +9,10 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_control.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
@@ -31,6 +33,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 static int num_slots_on_old_cluster;
 
@@ -109,6 +112,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1478,3 +1483,68 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position, except for the CHECKPOINT_SHUTDOWN record. Even if all WALs
+	 * are consumed before shutting down the node, the checkpointer generates
+	 * a CHECKPOINT_SHUTDOWN record at shutdown, which cannot be consumed by
+	 * any slots. Therefore, we must allow for a difference between
+	 * pg_current_wal_insert_lsn() and confirmed_flush_lsn.
+	 */
+#define SHUTDOWN_RECORD_SIZE  (SizeOfXLogRecord + \
+							   SizeOfXLogRecordDataHeaderShort + \
+							   sizeof(CheckPoint))
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) > %d "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');",
+							(int) (SizeOfXLogLongPHD + SHUTDOWN_RECORD_SIZE));
+
+#undef SHUTDOWN_RECORD_SIZE
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 9ca266f6b2..0bea0b5b17 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -112,6 +112,27 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # appropriate value
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
 
+# Cause a failure at the start of pg_upgrade because test_slot does not
+# finish consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
 # Remove an unnecessary slot and consume WALs
 $old_node->start;
 $old_node->safe_psql(
-- 
2.27.0

v17-0004-Change-the-method-used-to-check-logical-replicat.patchapplication/octet-stream; name=v17-0004-Change-the-method-used-to-check-logical-replicat.patchDownload

From f9f631c391c2cc7eb2a6a1a41773626fee3a1022 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 24 Apr 2023 11:03:28 +0000
Subject: [PATCH v17 4/4] Change the method used to check logical replication
 slots during the live check

When a live check is requested, there is a possibility of additional changes
occurring, which may cause the current WAL position to exceed the confirmed_flush_lsn
of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
instead. This is sufficient as all the WAL records will be sent during the publisher's
shutdown.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    | 64 +++++++++++++++++-
 .../t/003_logical_replication_slots.pl        | 66 ++++++++++++++++++-
 2 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index a8888933f6..42c2e3bebc 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -34,6 +34,7 @@ static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
+static void check_are_logical_slots_active(ClusterInfo *cluster);
 
 static int num_slots_on_old_cluster;
 
@@ -113,7 +114,19 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 	if (user_opts.include_logical_slots)
-		check_for_confirmed_flush_lsn(&old_cluster);
+	{
+		/*
+		 * The method used to check logical replication slots is dependent on
+		 * the value of the live_check parameter. This change was implemented
+		 * because, during a live check, it is possible for additional changes
+		 * to occur at the old node, which could cause the current WAL position
+		 * to exceed the confirmed_flush_lsn of the slot.
+		 */
+		if (live_check)
+			check_are_logical_slots_active(&old_cluster);
+		else
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1484,6 +1497,55 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 	check_ok();
 }
 
+/*
+ * Verify that all logical replication slots are active
+ */
+static void
+check_are_logical_slots_active(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking the status of subscriptions for logical replication slots");
+
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE active IS FALSE "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is not active",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots with --check requires that "
+				 "all logical replication slots are active");
+
+	check_ok();
+}
+
 /*
  * Verify that all logical replication slots consumed all WALs, except a
  * CHECKPOINT_SHUTDOWN record.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 0bea0b5b17..f711452737 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -19,6 +19,10 @@ my $old_node = PostgreSQL::Test::Cluster->new('old_node');
 $old_node->init(allows_streaming => 'logical');
 $old_node->start;
 
+# Initialize subscriber, which will be used only for --check
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 # Initialize new node
 my $new_node = PostgreSQL::Test::Cluster->new('new_node');
 $new_node->init(allows_streaming => 1);
@@ -76,13 +80,70 @@ rmtree($new_node->data_dir . "/pg_upgrade_output.d");
 # non-zero value
 $new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
+# Setup logical replication
 $old_node->start;
+$old_node->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a");
+$old_node->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+my $old_connstr = $old_node->connstr . ' dbname=postgres';
+
+$subscriber->start;
+$subscriber->safe_psql('postgres', "CREATE TABLE tbl (a int)");
+$subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (copy_data = true)"
+);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_node, 'sub');
+
+my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(10), 'check initial rows on subscriber');
+
+# Start a background session and open a transaction (not committed yet)
+my $bsession = $old_node->background_psql('postgres');
+$bsession->query_safe(
+	q{
+BEGIN;
+INSERT INTO tbl VALUES (generate_series(11, 20))
+});
+
+$result = $old_node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_replication_slots WHERE pg_current_wal_insert_lsn() > confirmed_flush_lsn"
+);
+is($result, qq(1),
+	'check the current WAL position exceeds confirmed_flush_lsn');
+
+# Run pg_upgrade --check. In the command the status of each logical slots will
+# be checked and then this will be succeeded.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+		'--check'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $old_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Cleanup
+$bsession->query_safe("ABORT");
+$bsession->quit;
+$subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub");
+$subscriber->stop();
+
+# Create a slot on old node, and generate WALs
 $old_node->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
 	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
-	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+	INSERT INTO tbl VALUES (generate_series(11, 20));
 ]);
 
 $old_node->stop;
@@ -163,5 +224,6 @@ $new_node->start;
 my $result = $new_node->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
 is($result, qq(test_slot|t), 'check the slot exists on new node');
+$new_node->stop();
 
 done_testing();
-- 
2.27.0

#66

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#60)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for reviewing! The patch could be available at [1]/messages/by-id/TYAPR01MB5866E9ED5B8C5AD7F7AC062FF539A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Few comments/questions
====================
1.
+check_for_parameter_settings(ClusterInfo *new_cluster)
{
...
+
+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
...
}
Won't it be better to verify that the value of "max_replication_slots"
is greater than the number of logical slots we are planning to copy
from old on the new cluster? Similar to this, I thought whether we
need to check the value of max_wal_senders? But, I guess one can
simply decode from slots by using APIs, so not sure about that. What
do you think?

Agreed for verifying the max_replication_slots. There are several ways to add it,
so I chose the simplest one - store the #slots to global variable and compare
between it and max_replication_slots.
As for the max_wal_senders, I don't think it should be. As you said, there is a
possibility user-defined background worker uses the slot and consumes WALs.

2.
+ /*
+ * Dump logical replication slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.
+ */
+ if (user_opts.include_logical_slots)

Can you explain this point a bit more with some example scenarios?
Basically, if we had sent all the WAL before the upgrade then why do
we need to worry about the timing of pg_resetwal?

OK, I can tell the example here. Should it be described on the source?

Assuming that there is a valid logical replication slot as follows:

postgres=# select * from pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/15665E0
(1 row)
```

And here let's execute the pg_resetwal to the pg server.
The existing wal segment file is purged and moved to next seg.

```
$ pg_ctl stop -D data_N1/
waiting for server to shut down.... done
server stopped
$ pg_resetwal -l 000000010000000000000002 data_N1/
Write-ahead log reset
$ pg_ctl start -D data_N1/ -l N1.log
waiting for server to start.... done
server started
```

After that the logical slot cannot move foward anymore because the required WALs
are removed, whereas the wal_status is still "reserved".

postgres=# select * from pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/2028328
(1 row)

postgres=# select * from pg_logical_slot_get_changes('test', NULL, NULL);
ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed
```

pg_upgrade runs pg_dump and then pg_resetwal, so dumping slots must be done
separately to avoid above error.

3. I see that you are trying to ensure that all the WAL has been
consumed for a slot except for shutdown_checkpoint in patch 0003 but
do we need to think of any interaction with restart_lsn
(MyReplicationSlot->data.restart_lsn) which is the start point to read
WAL for decoding by walsender?

Currently I'm not sure it should be considered. Do you have in mind?

candidate_restart_lsn for the slot is set ony when XLOG_RUNNING_XACTS is decoded
(LogicalIncreaseRestartDecodingForSlot()), and is set as restart_lsn later. So
there are few timings to update the value and we cannot determine the accepatble
boundary.

Furthermore, I think restart point is not affect the result for replicating
changes on subscriber because it is always behind confimed_flush.

[1]: /messages/by-id/TYAPR01MB5866E9ED5B8C5AD7F7AC062FF539A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#67

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#66)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Jul 19, 2023 at 7:33 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

2.
+ /*
+ * Dump logical replication slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.
+ */
+ if (user_opts.include_logical_slots)
Can you explain this point a bit more with some example scenarios?
Basically, if we had sent all the WAL before the upgrade then why do
we need to worry about the timing of pg_resetwal?
OK, I can tell the example here. Should it be described on the source?

Assuming that there is a valid logical replication slot as follows:

```
postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from pg_replication_slots;
slot_name | plugin | restart_lsn | wal_status | two_phase
-----------+---------------+-------------+------------+-----------
test | test_decoding | 0/15665A8 | reserved | f
(1 row)

postgres=# select * from pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/15665E0
(1 row)
```

And here let's execute the pg_resetwal to the pg server.
The existing wal segment file is purged and moved to next seg.

```
$ pg_ctl stop -D data_N1/
waiting for server to shut down.... done
server stopped
$ pg_resetwal -l 000000010000000000000002 data_N1/
Write-ahead log reset
$ pg_ctl start -D data_N1/ -l N1.log
waiting for server to start.... done
server started
```

After that the logical slot cannot move foward anymore because the required WALs
are removed, whereas the wal_status is still "reserved".

```
postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from pg_replication_slots;
slot_name | plugin | restart_lsn | wal_status | two_phase
-----------+---------------+-------------+------------+-----------
test | test_decoding | 0/15665A8 | reserved | f
(1 row)

postgres=# select * from pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/2028328
(1 row)

postgres=# select * from pg_logical_slot_get_changes('test', NULL, NULL);
ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed
```

pg_upgrade runs pg_dump and then pg_resetwal, so dumping slots must be done
separately to avoid above error.

Okay, so the point is that if we create the slot in the new cluster
before pg_resetwal then its restart_lsn will be set to the current LSN
position which will later be reset by pg_resetwal. So, we won't be
able to use such a slot, right?

--
With Regards,
Amit Kapila.

#68

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#64)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

Based on the above, we are considering that we delay the timing of shutdown for
logical walsenders. The preliminary workflow is:

1. When logical walsenders receives siginal from checkpointer, it consumes all
of WAL records, change its state into WALSNDSTATE_STOPPING, and stop
doing
anything.
2. Then the checkpointer does the shutdown checkpoint
3. After that postmaster sends signal to walsenders, same as current
implementation.
4. Finally logical walsenders process the shutdown checkpoint record and update
the
confirmed_lsn after the acknowledgement from subscriber.
Note that logical walsenders don't have to send a shutdown checkpoint record
to subscriber but following keep_alive will help us to increment the
confirmed_lsn.
5. All tasks are done, they exit.

This mechanism ensures that the confirmed_lsn of active slots is same as the
current
WAL location of old publisher, so that 0003 patch would become more simpler.
We would not have to calculate the acceptable difference anymore.

One thing we must consider is that any WALs must not be generated while
decoding
the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
it, so the change may be acceptable. Thought?

I've implemented the ideas from my previous proposal, PSA another patch set.
Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The
workflow remains largely the same as described in my previous post, with the
following additions:

* A flag has been added to track whether all the WALs have been flushed. The
logical walsender can only exit after the flag is set. This ensures that all
WALs are flushed before the termination of the walsender.
* Cumulative statistics are now forcibly written before changing the state.
While the previous involved reporting stats upon process exit, the current approach
must report earlier due to the checkpointer's termination timing. See comments
in CheckpointerMain() and atop pgstat_before_server_shutdown().
* At the end of processes, slots are now saved to disk.

Patch 0002 adds --include-logical-replication-slots option to pg_upgrade,
not changed from previous set.

Patch 0003 adds a check function, which becomes simpler.
The previous version calculated the "acceptable" difference between confirmed_lsn
and the current WAL position. This was necessary because shutdown records could
not be sent to subscribers, creating a disparity in these values. However, this
approach had drawbacks, such as needing adjustments if record sizes changed.

Now, the record can be sent to subscribers, so the hacking is not needed anymore,
at least in the context of logical replication. The consistency is now maintained
by the logical walsenders, so slots created by the backend could not be.
We must consider what should be...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-Send-shutdown-checkpoint-record-to-subscriber.patchapplication/octet-stream; name=0001-Send-shutdown-checkpoint-record-to-subscriber.patchDownload

From 7f8f1cb96eab09b107d9022aea3b4386796a4dce Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 21 Jul 2023 05:34:21 +0000
Subject: [PATCH 1/3] Send shutdown checkpoint record to subscriber

---
 src/backend/replication/walsender.c | 30 +++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d27ef2985d..fc1363ba76 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -187,6 +187,9 @@ static bool WalSndCaughtUp = false;
 static volatile sig_atomic_t got_SIGUSR2 = false;
 static volatile sig_atomic_t got_STOPPING = false;
 
+/* Are all the WALs flushed? */
+static bool WalsAreFlushed = false;
+
 /*
  * This is set while we are streaming. When not set
  * PROCSIG_WALSND_INIT_STOPPING signal will be handled like SIGTERM. When set,
@@ -260,7 +263,6 @@ static bool TransactionIdInRecentPast(TransactionId xid, uint32 epoch);
 static void WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
 							  TimeLineID *tli_p);
 
-
 /* Initialize walsender process before entering the main command loop */
 void
 InitWalSender(void)
@@ -1581,7 +1583,10 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * written, because walwriter has shut down already.
 		 */
 		if (got_STOPPING)
+		{
 			XLogBackgroundFlush();
+			WalsAreFlushed = true;
+		}
 
 		/* Update our idea of the currently flushed position. */
 		if (!RecoveryInProgress())
@@ -3100,12 +3105,20 @@ XLogSendLogical(void)
 		WalSndCaughtUp = true;
 
 	/*
-	 * If we're caught up and have been requested to stop, have WalSndLoop()
-	 * terminate the connection in an orderly manner, after writing out all
-	 * the pending data.
+	 * If we're caught up, have been requested to stop and there are no pending
+	 * records to be sent, change to stopping mode.
 	 */
-	if (WalSndCaughtUp && got_STOPPING)
-		got_SIGUSR2 = true;
+	if (WalSndCaughtUp && WalsAreFlushed && !pq_is_send_pending())
+	{
+		/*
+		 * Update the stats forcibly. pgstat_shutdown_hook reports any pending
+		 * stats at the end of the process, but it would happen after the
+		 * checkpointer exits so that it would lead assertion failure. We must
+		 * ensure all the stats are recorded before changing the state.
+		 */
+		pgstat_report_stat(true);
+		WalSndSetState(WALSNDSTATE_STOPPING);
+	}
 
 	/* Update shared memory status */
 	{
@@ -3142,6 +3155,7 @@ WalSndDone(WalSndSendDataCallback send_data)
 	replicatedPtr = XLogRecPtrIsInvalid(MyWalSnd->flush) ?
 		MyWalSnd->write : MyWalSnd->flush;
 
+
 	if (WalSndCaughtUp && sentPtr == replicatedPtr &&
 		!pq_is_send_pending())
 	{
@@ -3152,6 +3166,10 @@ WalSndDone(WalSndSendDataCallback send_data)
 		EndCommand(&qc, DestRemote, false);
 		pq_flush();
 
+		/* Mark the slot as dirty and save it to update the confirmed_flush. */
+		ReplicationSlotMarkDirty();
+		ReplicationSlotSave();
+
 		proc_exit(0);
 	}
 	if (!waiting_for_ping_response)
-- 
2.27.0

0002-pg_upgrade-Add-include-logical-replication-slots-opt.patchapplication/octet-stream; name=0002-pg_upgrade-Add-include-logical-replication-slots-opt.patchDownload

From 5287baac5bf7b78bed33ae0655fd8066bde23330 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH 2/3] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file that executes pg_create_logical_replication_slot()
with the extracted parameters is generated.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores the slots
using the pg_create_logical_replication_slots() statements that the dump
generated (see above). Note that we cannot dump replication slots at the same time
as the schema dump because we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be restored
before executing the pg_resetwal command because it will remove WALs that are
required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 155 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  76 +++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 146 +++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 641 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..12d4066d3b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,12 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not exposed by the
+	 * help.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10278,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18263,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18525,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 170000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..4b54f5567c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+static int num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +195,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional work if --include-logical-replication-slots was specified.
+	 * This must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in that function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +381,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+			LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 pDbInfo->db_name,
+						 slot_arr->slots[0].slotname);
+		}
 	}
 }
 
@@ -1402,3 +1435,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* --include-logical-replication-slots can be used since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..8bc0ad2e10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,96 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +701,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..6dd3832422 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..8034067492 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..9ca266f6b2
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,146 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('to_be_dropped');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a1cf01e38e..141268bfd2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1498,7 +1498,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

0003-pg_upgrade-Add-check-function-for-include-logical-re.patchapplication/octet-stream; name=0003-pg_upgrade-Add-check-function-for-include-logical-re.patchDownload

From d4694a87b66e5b36d73cb18ba8eb7ad8cd223626 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH 3/3] pg_upgrade: Add check function for
 --include-logical-replication-slots option

XXX: Actually, this commit disallows to support slots which are created by user
backends. In the checking function we ensure that all the avtive slots have
confirmed_flush_lsn which is same as current WAL position, and they would not be
the same. For slots which are used by logical replication, logical walsenders
guarantee that at the shutdown. For individual slots, however, cannot be handled
by walsenders, so confirmed_flush_lsn is behind shutdown checkpoint record.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    |  57 ++++++++
 .../t/003_logical_replication_slots.pl        | 134 +++++++++++-------
 2 files changed, 138 insertions(+), 53 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 4b54f5567c..e9911cb302 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 static int num_slots_on_old_cluster;
 
@@ -109,6 +110,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1478,3 +1481,57 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (pg_catalog.pg_current_wal_insert_lsn() - confirmed_flush_lsn) <> 0 "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 9ca266f6b2..87049d7116 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,132 +15,160 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_drop_replication_slot('to_be_dropped');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT pg_drop_replication_slot('test_slot2');
 ]);
-$old_node->stop;
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots'
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#69

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#68)

4 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, 21 Jul 2023 at 13:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

Based on the above, we are considering that we delay the timing of shutdown for
logical walsenders. The preliminary workflow is:

1. When logical walsenders receives siginal from checkpointer, it consumes all
of WAL records, change its state into WALSNDSTATE_STOPPING, and stop
doing
anything.
2. Then the checkpointer does the shutdown checkpoint
3. After that postmaster sends signal to walsenders, same as current
implementation.
4. Finally logical walsenders process the shutdown checkpoint record and update
the
confirmed_lsn after the acknowledgement from subscriber.
Note that logical walsenders don't have to send a shutdown checkpoint record
to subscriber but following keep_alive will help us to increment the
confirmed_lsn.
5. All tasks are done, they exit.

This mechanism ensures that the confirmed_lsn of active slots is same as the
current
WAL location of old publisher, so that 0003 patch would become more simpler.
We would not have to calculate the acceptable difference anymore.

One thing we must consider is that any WALs must not be generated while
decoding
the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
it, so the change may be acceptable. Thought?

I've implemented the ideas from my previous proposal, PSA another patch set.
Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The
workflow remains largely the same as described in my previous post, with the
following additions:

* A flag has been added to track whether all the WALs have been flushed. The
logical walsender can only exit after the flag is set. This ensures that all
WALs are flushed before the termination of the walsender.
* Cumulative statistics are now forcibly written before changing the state.
While the previous involved reporting stats upon process exit, the current approach
must report earlier due to the checkpointer's termination timing. See comments
in CheckpointerMain() and atop pgstat_before_server_shutdown().
* At the end of processes, slots are now saved to disk.

Patch 0002 adds --include-logical-replication-slots option to pg_upgrade,
not changed from previous set.

Patch 0003 adds a check function, which becomes simpler.
The previous version calculated the "acceptable" difference between confirmed_lsn
and the current WAL position. This was necessary because shutdown records could
not be sent to subscribers, creating a disparity in these values. However, this
approach had drawbacks, such as needing adjustments if record sizes changed.

Now, the record can be sent to subscribers, so the hacking is not needed anymore,
at least in the context of logical replication. The consistency is now maintained
by the logical walsenders, so slots created by the backend could not be.
We must consider what should be...

How do you think?

Here is a patch which checks that there are no WAL records other than
CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
from [1]/messages/by-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6RsG6JWo=6pSn1E+iexL1Fw@mail.gmail.com.
Patch 0001 and 0002 is same as the patch posted by Kuroda-san, Patch
0003 exposes pg_get_wal_records_content to get the WAL records along
with the WAL record type between start and end lsn. pg_walinspect
contrib module already exposes a function for this requirement, I have
moved this functionality to be exposed from the backend. Patch 0004
has slight change in check function to check that there are no other
records other than CHECKPOINT_SHUTDOWN to be consumed. The attached
patch has the changes for the same.
Thoughts?

[1]: /messages/by-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6RsG6JWo=6pSn1E+iexL1Fw@mail.gmail.com

Regards,
Vignesh

Attachments:

0004-pg_upgrade-Add-check-function-for-include-logical-re.patchtext/x-patch; charset=US-ASCII; name=0004-pg_upgrade-Add-check-function-for-include-logical-re.patchDownload

From 49360788fea01756aa9578da3dad34786bd1281a Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH 4/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

XXX: Actually, this commit disallows to support slots which are created by user
backends. In the checking function we ensure that all the avtive slots have
confirmed_flush_lsn which is same as current WAL position, and they would not be
the same. For slots which are used by logical replication, logical walsenders
guarantee that at the shutdown. For individual slots, however, cannot be handled
by walsenders, so confirmed_flush_lsn is behind shutdown checkpoint record.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    |  59 ++++++++
 .../t/003_logical_replication_slots.pl        | 134 +++++++++++-------
 2 files changed, 140 insertions(+), 53 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 4b54f5567c..264895721a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 static int num_slots_on_old_cluster;
 
@@ -109,6 +110,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1478,3 +1481,59 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG16. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (SELECT count(record_type) "
+							"		FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn, pg_catalog.pg_current_wal_insert_lsn()) "
+							"		WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 9ca266f6b2..87049d7116 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,132 +15,160 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_drop_replication_slot('to_be_dropped');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT pg_drop_replication_slot('test_slot2');
 ]);
-$old_node->stop;
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots'
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.34.1

0002-pg_upgrade-Add-include-logical-replication-slots-opt.patchtext/x-patch; charset=US-ASCII; name=0002-pg_upgrade-Add-include-logical-replication-slots-opt.patchDownload

From 8cb9043ece78ba5235b11b7277b608a37044c559 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH 2/3] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file that executes pg_create_logical_replication_slot()
with the extracted parameters is generated.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores the slots
using the pg_create_logical_replication_slots() statements that the dump
generated (see above). Note that we cannot dump replication slots at the same time
as the schema dump because we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be restored
before executing the pg_resetwal command because it will remove WALs that are
required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  11 ++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 155 ++++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  76 +++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 146 +++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 641 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..94e90ff506 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        are included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..12d4066d3b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,21 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+
+		if (dopt.outputClean)
+			pg_fatal("options --logical-replication-slots-only and -c/--clean cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +899,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +969,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1144,12 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not exposed by the
+	 * help.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10278,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18263,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18525,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 170000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..4b54f5567c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+static int num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +195,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional work if --include-logical-replication-slots was specified.
+	 * This must be done before check_new_cluster_is_empty() because the
+	 * slot_arr attribute of the new_cluster will be checked in that function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +381,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+			LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 pDbInfo->db_name,
+						 slot_arr->slots[0].slotname);
+		}
 	}
 }
 
@@ -1402,3 +1435,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* --include-logical-replication-slots can be used since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..8bc0ad2e10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,96 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +701,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..6dd3832422 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..8034067492 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..9ca266f6b2
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,146 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('to_be_dropped');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 05814136c6..5241b7bf8e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1497,7 +1497,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.34.1

0001-Send-shutdown-checkpoint-record-to-subscriber.patchtext/x-patch; charset=US-ASCII; name=0001-Send-shutdown-checkpoint-record-to-subscriber.patchDownload

From ab8839138521af35f6b00f530a39d10ef8ead555 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 21 Jul 2023 05:34:21 +0000
Subject: [PATCH 1/3] Send shutdown checkpoint record to subscriber

---
 src/backend/replication/walsender.c | 30 +++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d27ef2985d..fc1363ba76 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -187,6 +187,9 @@ static bool WalSndCaughtUp = false;
 static volatile sig_atomic_t got_SIGUSR2 = false;
 static volatile sig_atomic_t got_STOPPING = false;
 
+/* Are all the WALs flushed? */
+static bool WalsAreFlushed = false;
+
 /*
  * This is set while we are streaming. When not set
  * PROCSIG_WALSND_INIT_STOPPING signal will be handled like SIGTERM. When set,
@@ -260,7 +263,6 @@ static bool TransactionIdInRecentPast(TransactionId xid, uint32 epoch);
 static void WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
 							  TimeLineID *tli_p);
 
-
 /* Initialize walsender process before entering the main command loop */
 void
 InitWalSender(void)
@@ -1581,7 +1583,10 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * written, because walwriter has shut down already.
 		 */
 		if (got_STOPPING)
+		{
 			XLogBackgroundFlush();
+			WalsAreFlushed = true;
+		}
 
 		/* Update our idea of the currently flushed position. */
 		if (!RecoveryInProgress())
@@ -3100,12 +3105,20 @@ XLogSendLogical(void)
 		WalSndCaughtUp = true;
 
 	/*
-	 * If we're caught up and have been requested to stop, have WalSndLoop()
-	 * terminate the connection in an orderly manner, after writing out all
-	 * the pending data.
+	 * If we're caught up, have been requested to stop and there are no pending
+	 * records to be sent, change to stopping mode.
 	 */
-	if (WalSndCaughtUp && got_STOPPING)
-		got_SIGUSR2 = true;
+	if (WalSndCaughtUp && WalsAreFlushed && !pq_is_send_pending())
+	{
+		/*
+		 * Update the stats forcibly. pgstat_shutdown_hook reports any pending
+		 * stats at the end of the process, but it would happen after the
+		 * checkpointer exits so that it would lead assertion failure. We must
+		 * ensure all the stats are recorded before changing the state.
+		 */
+		pgstat_report_stat(true);
+		WalSndSetState(WALSNDSTATE_STOPPING);
+	}
 
 	/* Update shared memory status */
 	{
@@ -3142,6 +3155,7 @@ WalSndDone(WalSndSendDataCallback send_data)
 	replicatedPtr = XLogRecPtrIsInvalid(MyWalSnd->flush) ?
 		MyWalSnd->write : MyWalSnd->flush;
 
+
 	if (WalSndCaughtUp && sentPtr == replicatedPtr &&
 		!pq_is_send_pending())
 	{
@@ -3152,6 +3166,10 @@ WalSndDone(WalSndSendDataCallback send_data)
 		EndCommand(&qc, DestRemote, false);
 		pq_flush();
 
+		/* Mark the slot as dirty and save it to update the confirmed_flush. */
+		ReplicationSlotMarkDirty();
+		ReplicationSlotSave();
+
 		proc_exit(0);
 	}
 	if (!waiting_for_ping_response)
-- 
2.34.1

0003-Move-pg_get_wal_records_info-functionality-from-pg_w.patchtext/x-patch; charset=US-ASCII; name=0003-Move-pg_get_wal_records_info-functionality-from-pg_w.patchDownload

From df324a756b175ad9014d8dfe439fea1fa8137029 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 28 Jul 2023 16:50:52 +0530
Subject: [PATCH 3/3] Move pg_get_wal_records_info functionality from
 pg_walinspect to backend.

Upgrade of publications required pg_get_wal_records_info to check that
there are no WAL record other thatn CHECKPOINT_SHUTDOWN WAL record to be
consumed. Hence moved pg_get_wal_records_info functionality as
pg_get_wal_records_content to the backend so that it can be called from
pg_upgrade.
---
 contrib/pg_walinspect/Makefile                |   2 +-
 contrib/pg_walinspect/meson.build             |   1 +
 .../pg_walinspect/pg_walinspect--1.1--1.2.sql |  32 ++++
 contrib/pg_walinspect/pg_walinspect.c         | 148 +-----------------
 contrib/pg_walinspect/pg_walinspect.control   |   2 +-
 doc/src/sgml/func.sgml                        |  52 ++++++
 src/backend/access/transam/xlog.c             |  27 ++++
 src/backend/access/transam/xlogfuncs.c        | 125 +++++++++++++++
 src/backend/access/transam/xlogutils.c        | 117 ++++++++++++++
 src/backend/catalog/system_functions.sql      |   4 +
 src/include/access/xlog.h                     |   2 +
 src/include/access/xlogutils.h                |   4 +
 src/include/catalog/pg_proc.dat               |   9 ++
 src/test/modules/brin/t/02_wal_consistency.pl |   2 +-
 .../modules/test_custom_rmgrs/t/001_basic.pl  |   2 +-
 src/test/regress/expected/misc_functions.out  |  79 ++++++++++
 src/test/regress/sql/misc_functions.sql       |  57 +++++++
 17 files changed, 515 insertions(+), 150 deletions(-)
 create mode 100644 contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql

diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 22090f7716..5cc7d81b42 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
 PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
 
 EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql pg_walinspect--1.1--1.2.sql
 
 REGRESS = pg_walinspect oldextversions
 
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index 80059f6119..8f7a99a493 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -20,6 +20,7 @@ install_data(
   'pg_walinspect.control',
   'pg_walinspect--1.0.sql',
   'pg_walinspect--1.0--1.1.sql',
+  'pg_walinspect--1.1--1.2.sql',
   kwargs: contrib_data_args,
 )
 
diff --git a/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql b/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql
new file mode 100644
index 0000000000..41ec538623
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql
@@ -0,0 +1,32 @@
+/* contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO 1.2" to load this file. \quit
+
+/* These functions are now in the backend and callers should update to use those */
+ALTER EXTENSION pg_walinspect DROP FUNCTION pg_get_wal_records_info;
+DROP FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn);
+
+--
+-- pg_get_wal_records_info()
+--
+CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
+    IN end_lsn pg_lsn,
+    OUT start_lsn pg_lsn,
+    OUT end_lsn pg_lsn,
+    OUT prev_lsn pg_lsn,
+    OUT xid xid,
+    OUT resource_manager text,
+    OUT record_type text,
+    OUT record_length int4,
+    OUT main_data_length int4,
+    OUT fpi_length int4,
+    OUT description text,
+    OUT block_ref text
+)
+RETURNS SETOF record
+AS 'pg_get_wal_records_content'
+LANGUAGE INTERNAL STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..6e572d3436 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -38,10 +38,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
 PG_FUNCTION_INFO_V1(pg_get_wal_stats);
 PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
-static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
-static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -62,120 +58,6 @@ static void GetWalStats(FunctionCallInfo fcinfo,
 static void GetWALBlockInfo(FunctionCallInfo fcinfo, XLogReaderState *record,
 							bool show_data);
 
-/*
- * Return the LSN up to which the server has WAL.
- */
-static XLogRecPtr
-GetCurrentLSN(void)
-{
-	XLogRecPtr	curr_lsn;
-
-	/*
-	 * We determine the current LSN of the server similar to how page_read
-	 * callback read_local_xlog_page_no_wait does.
-	 */
-	if (!RecoveryInProgress())
-		curr_lsn = GetFlushRecPtr(NULL);
-	else
-		curr_lsn = GetXLogReplayRecPtr(NULL);
-
-	Assert(!XLogRecPtrIsInvalid(curr_lsn));
-
-	return curr_lsn;
-}
-
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
@@ -502,33 +384,6 @@ pg_get_wal_record_info(PG_FUNCTION_ARGS)
 #undef PG_GET_WAL_RECORD_INFO_COLS
 }
 
-/*
- * Validate start and end LSNs coming from the function inputs.
- *
- * If end_lsn is found to be higher than the current LSN reported by the
- * cluster, use the current LSN as the upper bound.
- */
-static void
-ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn)
-{
-	XLogRecPtr	curr_lsn = GetCurrentLSN();
-
-	if (start_lsn > curr_lsn)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("WAL start LSN must be less than current LSN"),
-				 errdetail("Current WAL LSN on the database system is at %X/%X.",
-						   LSN_FORMAT_ARGS(curr_lsn))));
-
-	if (start_lsn > *end_lsn)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("WAL start LSN must be less than end LSN")));
-
-	if (*end_lsn > curr_lsn)
-		*end_lsn = curr_lsn;
-}
-
 /*
  * Get info of all WAL records between start LSN and end LSN.
  */
@@ -582,7 +437,8 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
 }
 
 /*
- * Get info of all WAL records between start LSN and end LSN.
+ * The following functions have been removed in newer versions in 1.2, but
+ * they are kept around for compatibility.
  */
 Datum
 pg_get_wal_records_info(PG_FUNCTION_ARGS)
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index efa3cb2cfe..5f574b865b 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
 # pg_walinspect extension
 comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_walinspect'
 relocatable = true
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b94827674c..a8866feaa3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26670,6 +26670,37 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
         get the replication lag.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_wal_records_content</primary>
+        </indexterm>
+        <function>pg_get_wal_records_content</function> ( <parameter>lsn1</parameter> <type>pg_lsn</type>, <parameter>lsn2</parameter> <type>pg_lsn</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>start_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>end_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>prev_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>xid</parameter> <type>xid</type>,
+        <parameter>resource_manager</parameter> <type>text</type>,
+        <parameter>record_type</parameter> <type>text</type>,
+        <parameter>record_length</parameter> <type>integer</type>,
+        <parameter>main_data_length</parameter> <type>integer</type>,
+        <parameter>fpi_length</parameter> <type>integer</type>,
+        <parameter>description</parameter> <type>text</type>,
+        <parameter>block_ref</parameter> <type>text</type>
+         )
+       </para>
+       <para>
+        Gets information of all the valid WAL records between
+        <replaceable>start_lsn</replaceable> and <replaceable>end_lsn</replaceable>.
+        Returns one row per WAL record. This can be used
+        with <structname>pg_stat_replication</structname> or some of the
+        functions shown in <xref linkend="functions-admin-backup-table"/> to
+        get the replication lag WAL records content. The function raises an
+        error if <replaceable>start_lsn</replaceable> is not available.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -26728,6 +26759,27 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 </programlisting>
    </para>
 
+   <para>
+    <function>pg_get_wal_records_content</function> gets information of all the valid WAL records between
+     <replaceable>start_lsn</replaceable> and <replaceable>end_lsn</replaceable>.
+     Returns one row per WAL record.  For example:
+<screen>
+postgres=# SELECT * FROM pg_get_wal_records_content('0/1E913618', '0/1E913740') LIMIT 1;
+-[ RECORD 1 ]----+--------------------------------------------------------------
+start_lsn        | 0/1E913618
+end_lsn          | 0/1E913650
+prev_lsn         | 0/1E9135A0
+xid              | 0
+resource_manager | Standby
+record_type      | RUNNING_XACTS
+record_length    | 50
+main_data_length | 24
+fpi_length       | 0
+description      | nextXid 33775 latestCompletedXid 33774 oldestRunningXid 33775
+block_ref        |
+</screen>
+   </para>
+
   </sect2>
 
   <sect2 id="functions-recovery-control">
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6115df57fd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8988,3 +8988,30 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+/*
+ * Validate start and end LSNs coming from the function inputs.
+ *
+ * If end_lsn is found to be higher than the current LSN reported by the
+ * cluster, use the current LSN as the upper bound.
+ */
+void
+ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn)
+{
+	XLogRecPtr	curr_lsn = GetCurrentLSN();
+
+	if (start_lsn > curr_lsn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("WAL start LSN must be less than current LSN"),
+				 errdetail("Current WAL LSN on the database system is at %X/%X.",
+						   LSN_FORMAT_ARGS(curr_lsn))));
+
+	if (start_lsn > *end_lsn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("WAL start LSN must be less than end LSN")));
+
+	if (*end_lsn > curr_lsn)
+		*end_lsn = curr_lsn;
+}
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5044ff0643..bb66eb26b6 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -754,3 +754,128 @@ pg_promote(PG_FUNCTION_ARGS)
 						   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Get content of all WAL records between start LSN and end LSN.
+ */
+Datum
+pg_get_wal_records_content(PG_FUNCTION_ARGS)
+{
+#define PG_GET_WAL_RECORDS_INFO_COLS 11
+	FuncCallContext *funcctx;
+	XLogReaderState *xlogreader;
+
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogRecPtr	end_lsn = PG_GETARG_LSN(1);
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		TupleDesc	tupdesc;
+
+		ValidateInputLSNs(start_lsn, &end_lsn);
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		tupdesc = CreateTemplateTupleDesc(11);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "start_lsn",
+						   PG_LSNOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_lsn",
+						   PG_LSNOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "prev_lsn",
+						   PG_LSNOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "xid",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "resource_manager",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "record_type",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "record_length",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 8, "main_data_length",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 9, "fpi_length",
+						   INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 10, "description",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 11, "block_ref",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		if (start_lsn < end_lsn)
+			funcctx->user_fctx = InitXLogReaderState(start_lsn);
+		else
+			funcctx->user_fctx = NULL;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	xlogreader = (XLogReaderState *) funcctx->user_fctx;
+
+	while (xlogreader && ReadNextXLogRecord(xlogreader) &&
+		   xlogreader->EndRecPtr <= end_lsn)
+	{
+		Datum		values[11] = {0};
+		bool		nulls[11] = {0};
+		HeapTuple	tuple;
+		Datum		result;
+		XLogReaderState *record = xlogreader;
+		const char *record_type;
+		StringInfoData rec_desc;
+		StringInfoData rec_blk_ref;
+		int			i = 0;
+		uint32		fpi_len = 0;
+		RmgrData	desc;
+
+		desc = GetRmgr(XLogRecGetRmid(record));
+		record_type = desc.rm_identify(XLogRecGetInfo(record));
+
+		if (record_type == NULL)
+			record_type = psprintf("UNKNOWN (%x)", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
+
+		initStringInfo(&rec_desc);
+		desc.rm_desc(&rec_desc, record);
+
+		if (XLogRecHasAnyBlockRefs(record))
+		{
+			initStringInfo(&rec_blk_ref);
+			XLogRecGetBlockRefInfo(record, false, true, &rec_blk_ref, &fpi_len);
+		}
+
+		values[i++] = LSNGetDatum(record->ReadRecPtr);
+		values[i++] = LSNGetDatum(record->EndRecPtr);
+		values[i++] = LSNGetDatum(XLogRecGetPrev(record));
+		values[i++] = TransactionIdGetDatum(XLogRecGetXid(record));
+		values[i++] = CStringGetTextDatum(desc.rm_name);
+		values[i++] = CStringGetTextDatum(record_type);
+		values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
+		values[i++] = UInt32GetDatum(XLogRecGetDataLen(record));
+		values[i++] = UInt32GetDatum(fpi_len);
+
+		if (rec_desc.len > 0)
+			values[i++] = CStringGetTextDatum(rec_desc.data);
+		else
+			nulls[i++] = true;
+
+		if (XLogRecHasAnyBlockRefs(record))
+			values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+		else
+			nulls[i++] = true;
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+#undef PG_GET_WAL_RECORDS_INFO_COLS
+}
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index e174a2a891..90eab3e2f3 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,120 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Return the LSN up to which the server has WAL.
+ */
+XLogRecPtr
+GetCurrentLSN(void)
+{
+	XLogRecPtr	curr_lsn;
+
+	/*
+	 * We determine the current LSN of the server similar to how page_read
+	 * callback read_local_xlog_page_no_wait does.
+	 */
+	if (!RecoveryInProgress())
+		curr_lsn = GetFlushRecPtr(NULL);
+	else
+		curr_lsn = GetXLogReplayRecPtr(NULL);
+
+	Assert(!XLogRecPtrIsInvalid(curr_lsn));
+
+	return curr_lsn;
+}
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+	{
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+	}
+
+	return xlogreader;
+}
+
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 07c0d89c4f..1fed0def9b 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -616,6 +616,8 @@ REVOKE EXECUTE ON FUNCTION pg_backup_stop(boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_switch_wal() FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_log_standby_snapshot() FROM public;
@@ -726,6 +728,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
 -- We also set up some things as accessible to standard roles.
 --
 
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn) TO pg_read_server_files;
+
 GRANT EXECUTE ON FUNCTION pg_ls_logdir() TO pg_monitor;
 
 GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO pg_monitor;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 48ca852381..4ec1fc9d82 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,6 +259,8 @@ extern void SetInstallXLogFileSegmentActive(void);
 extern bool IsInstallXLogFileSegmentActive(void);
 extern void XLogShutdownWalRcv(void);
 
+extern void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
+
 /*
  * Routines to start, stop, and get status of a base backup.
  */
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..77e2974ab3 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,8 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogRecPtr GetCurrentLSN(void);
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..d45696fca0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6489,6 +6489,15 @@
   proargnames => '{rm_id, rm_name, rm_builtin}',
   prosrc => 'pg_get_wal_resource_managers' },
 
+{ oid => '3813',
+  descr => 'Info of WAL conents between start LSN and end LSN',
+  proname => 'pg_get_wal_records_content', prorows => '10', proretset => 't',
+  provolatile => 's', prorettype => 'record', proargtypes => 'pg_lsn pg_lsn',
+  proallargtypes => '{pg_lsn,pg_lsn,pg_lsn,pg_lsn,pg_lsn,xid,text,text,int4,int4,int4,text,text}',
+  proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{start_lsn,end_lsn,start_lsn,end_lsn,prev_lsn,xid,resource_manager,record_type,record_length,main_data_length,fpi_length,description,block_ref}',
+  prosrc => 'pg_get_wal_records_content' },
+
 { oid => '2621', descr => 'reload configuration files',
   proname => 'pg_reload_conf', provolatile => 'v', prorettype => 'bool',
   proargtypes => '', prosrc => 'pg_reload_conf' },
diff --git a/src/test/modules/brin/t/02_wal_consistency.pl b/src/test/modules/brin/t/02_wal_consistency.pl
index 8b2b244feb..288bbd7a60 100644
--- a/src/test/modules/brin/t/02_wal_consistency.pl
+++ b/src/test/modules/brin/t/02_wal_consistency.pl
@@ -64,7 +64,7 @@ my $end_lsn = $whiskey->lsn('flush');
 
 my ($ret, $out, $err) = $whiskey->psql(
 	'postgres', qq{
-	select count(*) from pg_get_wal_records_info('$start_lsn', '$end_lsn')
+	select count(*) from pg_get_wal_records_content('$start_lsn', '$end_lsn')
 	where resource_manager = 'BRIN' AND
 	record_type ILIKE '%revmap%'
 	});
diff --git a/src/test/modules/test_custom_rmgrs/t/001_basic.pl b/src/test/modules/test_custom_rmgrs/t/001_basic.pl
index 50655d3788..4000de0560 100644
--- a/src/test/modules/test_custom_rmgrs/t/001_basic.pl
+++ b/src/test/modules/test_custom_rmgrs/t/001_basic.pl
@@ -54,7 +54,7 @@ my $expected =
   qq($record_end_lsn|test_custom_rmgrs|TEST_CUSTOM_RMGRS_MESSAGE|0|payload (10 bytes): payload123);
 my $result = $node->safe_psql(
 	'postgres',
-	qq[SELECT end_lsn, resource_manager, record_type, fpi_length, description FROM pg_get_wal_records_info('$start_lsn', '$end_lsn')
+	qq[SELECT end_lsn, resource_manager, record_type, fpi_length, description FROM pg_get_wal_records_content('$start_lsn', '$end_lsn')
 		WHERE resource_manager = 'test_custom_rmgrs';]);
 is($result, $expected,
 	'custom WAL resource manager has successfully written a WAL record');
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index c669948370..e5e10cddf9 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -642,3 +642,82 @@ SELECT segment_number > 0 AS ok_segment_number, timeline_id
  t                 |  4294967295
 (1 row)
 
+-- pg_get_wal_records_content
+CREATE TABLE sample_tbl(col1 int, col2 int);
+SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
+INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
+-- Mask DETAIL messages as these could refer to current LSN positions.
+\set VERBOSITY terse
+-- Invalid start LSN.
+SELECT * FROM pg_get_wal_records_content('0/0', :'wal_lsn1');
+ERROR:  could not read WAL at LSN 0/0
+-- Start LSN > End LSN.
+SELECT * FROM pg_get_wal_records_content(:'wal_lsn2', :'wal_lsn1');
+ERROR:  WAL start LSN must be less than end LSN
+-- Success with end LSNs.
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', 'FFFFFFFF/FFFFFFFF');
+ ok 
+----
+ t
+(1 row)
+
+-- Failures with start LSNs.
+SELECT * FROM pg_get_wal_records_content('FFFFFFFF/FFFFFFFE', 'FFFFFFFF/FFFFFFFF');
+ERROR:  WAL start LSN must be less than current LSN
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2');
+ ok 
+----
+ t
+(1 row)
+
+-- Test for filtering out WAL records of a particular table
+SELECT oid AS sample_tbl_oid FROM pg_class WHERE relname = 'sample_tbl' \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2')
+			WHERE block_ref LIKE concat('%', :'sample_tbl_oid', '%') AND resource_manager = 'Heap';
+ ok 
+----
+ t
+(1 row)
+
+-- Test for filtering out WAL records based on resource_manager and
+-- record_type
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2')
+			WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+ ok 
+----
+ t
+(1 row)
+
+\set VERBOSITY default
+-- Tests for permissions
+CREATE ROLE regress_pg_get_wal;
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege 
+------------------------
+ f
+(1 row)
+
+-- Functions accessible by users with role pg_read_server_files.
+GRANT pg_read_server_files TO regress_pg_get_wal;
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege 
+------------------------
+ t
+(1 row)
+
+-- Superuser can grant execute to other users.
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn)
+  TO regress_pg_get_wal;
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege 
+------------------------
+ t
+(1 row)
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn)
+  FROM regress_pg_get_wal;
+DROP ROLE regress_pg_get_wal;
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b57f01f3e9..40cee85dbc 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -237,3 +237,60 @@ SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('ffffffFF00000001000000af');
+
+-- pg_get_wal_records_content
+CREATE TABLE sample_tbl(col1 int, col2 int);
+SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
+INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
+
+-- Mask DETAIL messages as these could refer to current LSN positions.
+\set VERBOSITY terse
+
+-- Invalid start LSN.
+SELECT * FROM pg_get_wal_records_content('0/0', :'wal_lsn1');
+-- Start LSN > End LSN.
+SELECT * FROM pg_get_wal_records_content(:'wal_lsn2', :'wal_lsn1');
+-- Success with end LSNs.
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', 'FFFFFFFF/FFFFFFFF');
+-- Failures with start LSNs.
+SELECT * FROM pg_get_wal_records_content('FFFFFFFF/FFFFFFFE', 'FFFFFFFF/FFFFFFFF');
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2');
+
+-- Test for filtering out WAL records of a particular table
+SELECT oid AS sample_tbl_oid FROM pg_class WHERE relname = 'sample_tbl' \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2')
+			WHERE block_ref LIKE concat('%', :'sample_tbl_oid', '%') AND resource_manager = 'Heap';
+
+-- Test for filtering out WAL records based on resource_manager and
+-- record_type
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_content(:'wal_lsn1', :'wal_lsn2')
+			WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+
+\set VERBOSITY default
+
+-- Tests for permissions
+CREATE ROLE regress_pg_get_wal;
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
+-- Functions accessible by users with role pg_read_server_files.
+GRANT pg_read_server_files TO regress_pg_get_wal;
+
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
+-- Superuser can grant execute to other users.
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn)
+  TO regress_pg_get_wal;
+
+SELECT has_function_privilege('regress_pg_get_wal',
+  'pg_get_wal_records_content(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_records_content(pg_lsn, pg_lsn)
+  FROM regress_pg_get_wal;
+
+DROP ROLE regress_pg_get_wal;
-- 
2.34.1

#70

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: vignesh C (#69)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Jul 28, 2023 at 5:48 PM vignesh C <vignesh21@gmail.com> wrote:

Here is a patch which checks that there are no WAL records other than
CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
from [1].

Few comments:
=============
1. Do we really need 0001 patch after the latest change proposed by
Vignesh in the 0004 patch?

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");

Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?

3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");

I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?

--
With Regards,
Amit Kapila.

#71

Jonathan S. Katz

jkatz@postgresql.org

over 2 years ago

In reply to: Amit Kapila (#70)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On 8/1/23 5:39 AM, Amit Kapila wrote:

On Fri, Jul 28, 2023 at 5:48 PM vignesh C <vignesh21@gmail.com> wrote:

Here is a patch which checks that there are no WAL records other than
CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
from [1].

Few comments:
=============

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?

Can I take this a step further on the user interface and ask why the
flag would be "--include-logical-replication-slots" vs. being enabled by
default?

Are there reasons why we wouldn't enable this feature by default on
pg_upgrade, and instead (if need be) have a flag that would be
"--exclude-logical-replication-slots"? Right now, not having the ability
to run pg_upgrade with logical replication slots enabled on the
publisher is a a very big pain point for users, so I would strongly
recommend against adding friction unless there is a very large challenge
with such an implementation.

Thanks,

Jonathan

#72

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Jonathan S. Katz (#71)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Jonathan,

Thank you for reading the thread!

Can I take this a step further on the user interface and ask why the
flag would be "--include-logical-replication-slots" vs. being enabled by
default?

Are there reasons why we wouldn't enable this feature by default on
pg_upgrade, and instead (if need be) have a flag that would be
"--exclude-logical-replication-slots"? Right now, not having the ability
to run pg_upgrade with logical replication slots enabled on the
publisher is a a very big pain point for users, so I would strongly
recommend against adding friction unless there is a very large challenge
with such an implementation.

The main reason was that there were no major complaints till now. This decision
followed the related discussion, for upgrading the subscriber [1]/messages/by-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw@mail.gmail.com. As mentioned
there, current style might have more flexibility. Of course we could change that
if there are more opinions around here.
(I believe that this feature is useful for everyone, but changing the default may
affect others...)

As for the implementation, I did not check so deeply but there is no challenge.
We cannot change the style pg_dump option due to the pg_resetwal ordering issue[2]/messages/by-id/TYAPR01MB58668C61A3C6EE82AE436C07F539A@TYAPR01MB5866.jpnprd01.prod.outlook.com,
but it option is not visible from users. I will check deeper when we want to do...

How do you think?

[1]: /messages/by-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw@mail.gmail.com
[2]: /messages/by-id/TYAPR01MB58668C61A3C6EE82AE436C07F539A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#73

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#69)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for making the PoC!

Here is a patch which checks that there are no WAL records other than
CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
from [1].

Basically I agreed your approach. Thanks!

Patch 0001 and 0002 is same as the patch posted by Kuroda-san, Patch
0003 exposes pg_get_wal_records_content to get the WAL records along
with the WAL record type between start and end lsn. pg_walinspect
contrib module already exposes a function for this requirement, I have
moved this functionality to be exposed from the backend. Patch 0004
has slight change in check function to check that there are no other
records other than CHECKPOINT_SHUTDOWN to be consumed. The attached
patch has the changes for the same.
Thoughts?

[1] -
/messages/by-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6
RsG6JWo%3D6pSn1E%2BiexL1Fw%40mail.gmail.com

Few comments:

* Per comment from Amit [1]/messages/by-id/CAA4eK1LWKkoyy-p-SAT0JTWa=6kXiMd=a6ZcArY9eU4a3g4TZg@mail.gmail.com, I used pg_get_wal_record_info() instead of pg_get_wal_records_info().
This function extract a next available WAL record, which can avoid huge scan if
the confirmed_flush is much behind.
* According to cfbot and my analysis, the 0001 cannot pass the test on macOS.
So I revived Julien's patch [2]/messages/by-id/20230414061248.vdsxz2febjo3re6h@jrouhaud as 0002 once. AFAIS the 0001 is not so welcomed.

Next patch will be available soon.

[1]: /messages/by-id/CAA4eK1LWKkoyy-p-SAT0JTWa=6kXiMd=a6ZcArY9eU4a3g4TZg@mail.gmail.com
[2]: /messages/by-id/20230414061248.vdsxz2febjo3re6h@jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#74

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#70)

4 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving comments! PSA new version patchset.

1. Do we really need 0001 patch after the latest change proposed by
Vignesh in the 0004 patch?

I removed 0001 patch and revived old patch which serializes slots at shutdown.
This is because the problem which slots are not serialized to disk still remain [1]/messages/by-id/20230414061248.vdsxz2febjo3re6h@jrouhaud
and then confirmed_flush becomes behind, even if we implement the approach.

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?

Regarding the --binary-upgrade, the motivation is same as you expected. I covered
up the --logical-replication-slots-only option from users, so it should not be
used not for upgrade. Additionaly, this option is not shown in help and document.

As for -{data|schema}-only options, I removed restrictions.
Firstly I set as excluded because it may be confused - as discussed at [2]/messages/by-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw@mail.gmail.com, slots
must be dumped after all the pg_resetwal is done and at that time all the definitions
are already dumped. to avoid duplicated definitions, we must ensure only slots are
written in the output file. I thought this requirement contradict descirptions of
these options (Dump only the A, not B).
But after considering more, I thought this might not be needed because it was not
opened to users - no one would be confused by using both them.
(Restriction for -c is also removed for the same motivation)

3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.

I considered again here. At least WALAVAIL_UNRESERVED should be supported because
the slot is still usable. It can return reserved or extended.

As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
to the document.

This feature re-create slots which have same name/plugins as old ones, not replicate
its state. So if we copy them as-is slots become usable again. If subscribers refer
the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
may be lost.

Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
not have a way to control that. the status is calculated by using restart_lsn,
but there are no function to modify directly.

One approach is adding an SQL funciton which set restart_lsn to aritrary value
(or 0/0, invalid), but it seems dangerous.

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");
I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?

I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.

But this function raise an ERROR when there is no valid record after the specified
lsn. This means that the pg_upgrade fails if logical slots has caught up the current
WAL location. IIUC DBA must do following steps:

1. shutdown old publisher
2. disable the subscription once <- this is mandatory, otherwise the walsender may
send the record during the upgrade and confirmed_lsn may point the SHUTDOWN_CHECKPOINT
3. do pg_upgrade <- pg_get_wal_record_content() may raise an ERROR if 2. was skipped
4. change the connection string of subscription
5. enable the subscription again

If we think this is not robust, we must implement similar function which does not raise ERROR instead.
How do you think?

[1]: /messages/by-id/20230414061248.vdsxz2febjo3re6h@jrouhaud
[2]: /messages/by-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v18-0001-pg_upgrade-Add-include-logical-replication-slots.patchapplication/octet-stream; name=v18-0001-pg_upgrade-Add-include-logical-replication-slots.patchDownload

From 2b522ec87876c141ae94915a398609ea9374f638 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v18 1/4] pg_upgrade: Add --include-logical-replication-slots
 option

This commit introduces a new pg_upgrade option called "--include-logical-replication-slots".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file that executes pg_create_logical_replication_slot()
with the extracted parameters is generated.

For pg_upgrade, when '--include-logical-replication-slots' is specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and restores the slots
using the pg_create_logical_replication_slots() statements that the dump
generated (see above). Note that we cannot dump replication slots at the same time
as the schema dump because we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be restored
before executing the pg_resetwal command because it will remove WALs that are
required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously, pg_upgrade
allowed copying publications to a new node. With this new commit, adjusting the
connection string to the new publisher will cause the apply worker on the subscriber
to connect to the new publisher automatically. This enables seamless continuation
of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  39 +++++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 143 +++++++++++++++++
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  77 +++++++++
 src/bin/pg_upgrade/dump.c                     |  24 +++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 146 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 658 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..9e62d6cb1d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent and usable
+        replication slots are included. Note that pg_upgrade does not check the
+        installation of plugins.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
@@ -402,6 +413,34 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     If you are upgrading a publisher node, it is recommended to use the
+     <option>--include-logical-replication-slots</option> option. This helps
+     avoid the need for manually defining the same replication slot on the new
+     publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
+
+    <para>
+     Note that only the replication slots with a wal_status different from
+     <literal>lost</literal> will be replicated to the new node. If there are
+     any subscriptions referring to a <literal>lost</literal> slot, they must
+     either be dropped or the slot_name should be changed to <literal>NONE</literal>
+     prior to upgrading.
+    </para>
+   </step>
+
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..c6d999b2cd 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,9 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only && !dopt.binary_upgrade)
+		pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +887,16 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+	{
+		getLogicalReplicationSlots(fout);
+		goto dump;
+	}
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -936,6 +957,8 @@ main(int argc, char **argv)
 	if (!dopt.no_security_labels)
 		collectSecLabels(fout);
 
+dump:
+
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
 
@@ -1109,6 +1132,12 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not exposed by
+	 * the help.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10266,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18251,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18513,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 170000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..e7a42620f3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+static int	num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,10 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Additionally, extract a list of logical replication slots if required */
+	if (user_opts.include_logical_slots)
+		num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +195,18 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Do additional work if --include-logical-replication-slots was
+	 * specified. This must be done before check_new_cluster_is_empty()
+	 * because the slot_arr attribute of the new_cluster will be checked in
+	 * that function.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +382,22 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * If --include-logical-replication-slots is required, check the
+		 * existence of slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			DbInfo	   *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+			LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 pDbInfo->db_name,
+						 slot_arr->slots[0].slotname);
+		}
 	}
 }
 
@@ -1402,3 +1436,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* --include-logical-replication-slots can be used since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..e6b90864f5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots if needed.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (user_opts.include_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..8bc0ad2e10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,96 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +701,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..df66a5ffe6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"include-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.include_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --include-logical-replication-slots\n"
+			 "                                upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..6dd3832422 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots if requested.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (user_opts.include_logical_slots)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..8034067492 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +322,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		include_logical_slots;	/* true -> dump and restore logical
+										 * replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..9ca266f6b2
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,146 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots',
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('to_be_dropped');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,        '--include-logical-replication-slots'
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 66823bc2a7..5a34e351ef 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1500,7 +1500,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v18-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v18-0002-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From e83458e9238639da95c32d3902ccd5926ff97413 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v18 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..5aed7cd190 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v18-0003-Move-pg_get_wal_record_info-functionality-from-p.patchapplication/octet-stream; name=v18-0003-Move-pg_get_wal_record_info-functionality-from-p.patchDownload

From e1cbde65108376c61377f12089bdc3990b53330e Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 1 Aug 2023 14:04:04 +0000
Subject: [PATCH v18 3/4] Move pg_get_wal_record_info functionality from
 pg_walinspect to core

Upgrade of publications required pg_get_wal_record_info to check that there are
no WAL records other than CHECKPOINT_SHUTDOWN WAL record to be consumed. Hence
moved pg_get_wal_record_info functionality as pg_get_wal_record_content
to core so that it can be called from pg_upgrade.
---
 contrib/pg_walinspect/Makefile                |   2 +-
 contrib/pg_walinspect/meson.build             |   1 +
 .../pg_walinspect/pg_walinspect--1.1--1.2.sql |  31 +++
 contrib/pg_walinspect/pg_walinspect.c         | 180 +-----------------
 contrib/pg_walinspect/pg_walinspect.control   |   2 +-
 src/backend/access/transam/xlogfuncs.c        |  50 +++++
 src/backend/access/transam/xlogutils.c        | 173 +++++++++++++++++
 src/backend/catalog/system_functions.sql      |   4 +
 src/include/access/xlogutils.h                |   6 +
 src/include/catalog/pg_proc.dat               |   9 +
 src/test/regress/expected/misc_functions.out  |  18 ++
 src/test/regress/sql/misc_functions.sql       |  16 ++
 12 files changed, 313 insertions(+), 179 deletions(-)
 create mode 100644 contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql

diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 22090f7716..5cc7d81b42 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
 PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
 
 EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql pg_walinspect--1.1--1.2.sql
 
 REGRESS = pg_walinspect oldextversions
 
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index 80059f6119..8f7a99a493 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -20,6 +20,7 @@ install_data(
   'pg_walinspect.control',
   'pg_walinspect--1.0.sql',
   'pg_walinspect--1.0--1.1.sql',
+  'pg_walinspect--1.1--1.2.sql',
   kwargs: contrib_data_args,
 )
 
diff --git a/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql b/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql
new file mode 100644
index 0000000000..48e2da3034
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql
@@ -0,0 +1,31 @@
+/* contrib/pg_walinspect/pg_walinspect--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.2'" to load this file. \quit
+
+-- The function is now in the backend and callers should update to use those.
+
+ALTER EXTENSION pg_walinspect DROP FUNCTION pg_get_wal_record_info;
+DROP FUNCTION pg_get_wal_record_info(pg_lsn);
+
+--
+-- pg_get_wal_record_info()
+--
+CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
+    OUT start_lsn pg_lsn,
+    OUT end_lsn pg_lsn,
+    OUT prev_lsn pg_lsn,
+    OUT xid xid,
+    OUT resource_manager text,
+    OUT record_type text,
+    OUT record_length int4,
+    OUT main_data_length int4,
+    OUT fpi_length int4,
+    OUT description text,
+    OUT block_ref text
+)
+AS 'pg_get_wal_record_content'
+LANGUAGE INTERNAL STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..8cba14e789 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -39,11 +39,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats);
 PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
-static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
-static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
-							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
 							  XLogRecPtr start_lsn,
 							  XLogRecPtr end_lsn);
@@ -62,178 +57,6 @@ static void GetWalStats(FunctionCallInfo fcinfo,
 static void GetWALBlockInfo(FunctionCallInfo fcinfo, XLogReaderState *record,
 							bool show_data);
 
-/*
- * Return the LSN up to which the server has WAL.
- */
-static XLogRecPtr
-GetCurrentLSN(void)
-{
-	XLogRecPtr	curr_lsn;
-
-	/*
-	 * We determine the current LSN of the server similar to how page_read
-	 * callback read_local_xlog_page_no_wait does.
-	 */
-	if (!RecoveryInProgress())
-		curr_lsn = GetFlushRecPtr(NULL);
-	else
-		curr_lsn = GetXLogReplayRecPtr(NULL);
-
-	Assert(!XLogRecPtrIsInvalid(curr_lsn));
-
-	return curr_lsn;
-}
-
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
-/*
- * Output values that make up a row describing caller's WAL record.
- *
- * This function leaks memory.  Caller may need to use its own custom memory
- * context.
- *
- * Keep this in sync with GetWALBlockInfo.
- */
-static void
-GetWALRecordInfo(XLogReaderState *record, Datum *values,
-				 bool *nulls, uint32 ncols)
-{
-	const char *record_type;
-	RmgrData	desc;
-	uint32		fpi_len = 0;
-	StringInfoData rec_desc;
-	StringInfoData rec_blk_ref;
-	int			i = 0;
-
-	desc = GetRmgr(XLogRecGetRmid(record));
-	record_type = desc.rm_identify(XLogRecGetInfo(record));
-
-	if (record_type == NULL)
-		record_type = psprintf("UNKNOWN (%x)", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
-
-	initStringInfo(&rec_desc);
-	desc.rm_desc(&rec_desc, record);
-
-	if (XLogRecHasAnyBlockRefs(record))
-	{
-		initStringInfo(&rec_blk_ref);
-		XLogRecGetBlockRefInfo(record, false, true, &rec_blk_ref, &fpi_len);
-	}
-
-	values[i++] = LSNGetDatum(record->ReadRecPtr);
-	values[i++] = LSNGetDatum(record->EndRecPtr);
-	values[i++] = LSNGetDatum(XLogRecGetPrev(record));
-	values[i++] = TransactionIdGetDatum(XLogRecGetXid(record));
-	values[i++] = CStringGetTextDatum(desc.rm_name);
-	values[i++] = CStringGetTextDatum(record_type);
-	values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
-	values[i++] = UInt32GetDatum(XLogRecGetDataLen(record));
-	values[i++] = UInt32GetDatum(fpi_len);
-
-	if (rec_desc.len > 0)
-		values[i++] = CStringGetTextDatum(rec_desc.data);
-	else
-		nulls[i++] = true;
-
-	if (XLogRecHasAnyBlockRefs(record))
-		values[i++] = CStringGetTextDatum(rec_blk_ref.data);
-	else
-		nulls[i++] = true;
-
-	Assert(i == ncols);
-}
-
-
 /*
  * Output one or more rows in rsinfo tuple store, each describing a single
  * block reference from caller's WAL record. (Should only be called with
@@ -454,6 +277,9 @@ pg_get_wal_block_info(PG_FUNCTION_ARGS)
 
 /*
  * Get WAL record info.
+ *
+ * Note that this function have been removed in newer versions in 1.2, but they
+ * are kept around for compatibility.
  */
 Datum
 pg_get_wal_record_info(PG_FUNCTION_ARGS)
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index efa3cb2cfe..5f574b865b 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
 # pg_walinspect extension
 comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_walinspect'
 relocatable = true
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5044ff0643..71ee1069f6 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -754,3 +754,53 @@ pg_promote(PG_FUNCTION_ARGS)
 						   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Get WAL record content.
+ */
+Datum
+pg_get_wal_record_content(PG_FUNCTION_ARGS)
+{
+#define PG_GET_WAL_RECORD_CONTENT_COLS 11
+	Datum		result;
+	Datum		values[PG_GET_WAL_RECORD_CONTENT_COLS] = {0};
+	bool		nulls[PG_GET_WAL_RECORD_CONTENT_COLS] = {0};
+	XLogRecPtr	lsn;
+	XLogRecPtr	curr_lsn;
+	XLogReaderState *xlogreader;
+	TupleDesc	tupdesc;
+	HeapTuple	tuple;
+
+	lsn = PG_GETARG_LSN(0);
+	curr_lsn = GetCurrentLSN();
+
+	if (lsn > curr_lsn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("WAL input LSN must be less than current LSN"),
+				 errdetail("Current WAL LSN on the database system is at %X/%X.",
+						   LSN_FORMAT_ARGS(curr_lsn))));
+
+	/* Build a tuple descriptor for our result type. */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	xlogreader = InitXLogReaderState(lsn);
+
+	if (!ReadNextXLogRecord(xlogreader))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("could not read WAL at %X/%X",
+						LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+
+	GetWALRecordInfo(xlogreader, values, nulls, PG_GET_WAL_RECORD_CONTENT_COLS);
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	PG_RETURN_DATUM(result);
+#undef PG_GET_WAL_RECORD_CONTENT_COLS
+}
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index e174a2a891..f637028b8e 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -28,8 +28,10 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/smgr.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/hsearch.h"
+#include "utils/pg_lsn.h"
 #include "utils/rel.h"
 
 
@@ -1048,3 +1050,174 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Return the LSN up to which the server has WAL.
+ */
+XLogRecPtr
+GetCurrentLSN(void)
+{
+	XLogRecPtr	curr_lsn;
+
+	/*
+	 * We determine the current LSN of the server similar to how page_read
+	 * callback read_local_xlog_page_no_wait does.
+	 */
+	if (!RecoveryInProgress())
+		curr_lsn = GetFlushRecPtr(NULL);
+	else
+		curr_lsn = GetXLogReplayRecPtr(NULL);
+
+	Assert(!XLogRecPtrIsInvalid(curr_lsn));
+
+	return curr_lsn;
+}
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
+
+/*
+ * Output values that make up a row describing caller's WAL record.
+ *
+ * This function leaks memory.  Caller may need to use its own custom memory
+ * context.
+ *
+ * Keep this in sync with GetWALBlockInfo.
+ */
+void
+GetWALRecordInfo(XLogReaderState *record, Datum *values,
+				 bool *nulls, uint32 ncols)
+{
+	const char *record_type;
+	RmgrData	desc;
+	uint32		fpi_len = 0;
+	StringInfoData rec_desc;
+	StringInfoData rec_blk_ref;
+	int			i = 0;
+
+	desc = GetRmgr(XLogRecGetRmid(record));
+	record_type = desc.rm_identify(XLogRecGetInfo(record));
+
+	if (record_type == NULL)
+		record_type = psprintf("UNKNOWN (%x)", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
+
+	initStringInfo(&rec_desc);
+	desc.rm_desc(&rec_desc, record);
+
+	if (XLogRecHasAnyBlockRefs(record))
+	{
+		initStringInfo(&rec_blk_ref);
+		XLogRecGetBlockRefInfo(record, false, true, &rec_blk_ref, &fpi_len);
+	}
+
+	values[i++] = LSNGetDatum(record->ReadRecPtr);
+	values[i++] = LSNGetDatum(record->EndRecPtr);
+	values[i++] = LSNGetDatum(XLogRecGetPrev(record));
+	values[i++] = TransactionIdGetDatum(XLogRecGetXid(record));
+	values[i++] = CStringGetTextDatum(desc.rm_name);
+	values[i++] = CStringGetTextDatum(record_type);
+	values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
+	values[i++] = UInt32GetDatum(XLogRecGetDataLen(record));
+	values[i++] = UInt32GetDatum(fpi_len);
+
+	if (rec_desc.len > 0)
+		values[i++] = CStringGetTextDatum(rec_desc.data);
+	else
+		nulls[i++] = true;
+
+	if (XLogRecHasAnyBlockRefs(record))
+		values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+	else
+		nulls[i++] = true;
+
+	Assert(i == ncols);
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 07c0d89c4f..0186d594fb 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -616,6 +616,8 @@ REVOKE EXECUTE ON FUNCTION pg_backup_stop(boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_get_wal_record_content(pg_lsn) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_switch_wal() FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_log_standby_snapshot() FROM public;
@@ -726,6 +728,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
 -- We also set up some things as accessible to standard roles.
 --
 
+GRANT EXECUTE ON FUNCTION pg_get_wal_record_content(pg_lsn) TO pg_read_server_files;
+
 GRANT EXECUTE ON FUNCTION pg_ls_logdir() TO pg_monitor;
 
 GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO pg_monitor;
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..95ad84ac6f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,10 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogRecPtr GetCurrentLSN(void);
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+extern void GetWALRecordInfo(XLogReaderState *record, Datum *values,
+							 bool *nulls, uint32 ncols);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..ffe146ba84 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6489,6 +6489,15 @@
   proargnames => '{rm_id, rm_name, rm_builtin}',
   prosrc => 'pg_get_wal_resource_managers' },
 
+
+{ oid => '8045', descr => 'Info of the WAL conent',
+  proname => 'pg_get_wal_record_content', prorows => '1', proretset => 't',
+  provolatile => 's', prorettype => 'record', proargtypes => 'pg_lsn',
+  proallargtypes => '{pg_lsn,pg_lsn,pg_lsn,pg_lsn,xid,text,text,int4,int4,int4,text,text}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{in_lsn,start_lsn,end_lsn,prev_lsn,xid,resource_manager,record_type,record_length,main_data_length,fpi_length,description,block_ref}',
+  prosrc => 'pg_get_wal_record_content' },
+
 { oid => '2621', descr => 'reload configuration files',
   proname => 'pg_reload_conf', provolatile => 'v', prorettype => 'bool',
   proargtypes => '', prosrc => 'pg_reload_conf' },
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index c669948370..bf2d1a5f9b 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -642,3 +642,21 @@ SELECT segment_number > 0 AS ok_segment_number, timeline_id
  t                 |  4294967295
 (1 row)
 
+-- pg_get_wal_record_content
+CREATE TABLE sample_tbl(col1 int, col2 int);
+SELECT pg_current_wal_lsn() AS wal_lsn \gset
+INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+-- Mask DETAIL messages as these could refer to current LSN positions.
+\set VERBOSITY terse
+-- Invalid input LSN.
+SELECT * FROM pg_get_wal_record_content('0/0');
+ERROR:  could not read WAL at LSN 0/0
+-- LSNs with the highest value possible.
+SELECT * FROM pg_get_wal_record_content('FFFFFFFF/FFFFFFFF');
+ERROR:  WAL input LSN must be less than current LSN
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_record_content(:'wal_lsn');
+ ok 
+----
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b57f01f3e9..4ff19b4927 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -237,3 +237,19 @@ SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('ffffffFF00000001000000af');
+
+-- pg_get_wal_record_content
+CREATE TABLE sample_tbl(col1 int, col2 int);
+SELECT pg_current_wal_lsn() AS wal_lsn \gset
+INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+
+-- Mask DETAIL messages as these could refer to current LSN positions.
+\set VERBOSITY terse
+
+-- Invalid input LSN.
+SELECT * FROM pg_get_wal_record_content('0/0');
+
+-- LSNs with the highest value possible.
+SELECT * FROM pg_get_wal_record_content('FFFFFFFF/FFFFFFFF');
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_record_content(:'wal_lsn');
-- 
2.27.0

v18-0004-pg_upgrade-Add-check-function-for-include-logica.patchapplication/octet-stream; name=v18-0004-pg_upgrade-Add-check-function-for-include-logica.patchDownload

From 0d6ed85edf1fbfbf9dca62a70d0a06fe6f4ccc98 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v18 4/4] pg_upgrade: Add check function for
 --include-logical-replication-slots option

XXX: Actually, this commit disallows to support slots which are created by user
backends. In the checking function we ensure that all the avtive slots have
confirmed_flush_lsn which is same as current WAL position, and they would not be
the same. For slots which are used by logical replication, logical walsenders
guarantee that at the shutdown. For individual slots, however, cannot be handled
by walsenders, so confirmed_flush_lsn is behind shutdown checkpoint record.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    |  59 ++++++++
 .../t/003_logical_replication_slots.pl        | 134 +++++++++++-------
 2 files changed, 140 insertions(+), 53 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e7a42620f3..b5eb1f8ccf 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 static int	num_slots_on_old_cluster;
 
@@ -109,6 +110,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (user_opts.include_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1479,3 +1482,59 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(user_opts.include_logical_slots);
+
+	/* --include-logical-replication-slots can be used since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (SELECT count(record_type) "
+							"		FROM pg_catalog.pg_get_wal_record_content(confirmed_flush_lsn) "
+							"		WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+							"AND temporary = false AND wal_status IN ('reserved', 'extended', 'unreserved');");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("--include-logical-replication-slots requires that all "
+				 "logical replication slots consumed all the WALs");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 9ca266f6b2..a6a20f61e0 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,132 +15,160 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding', false, true);
-	SELECT pg_create_logical_replication_slot('to_be_dropped', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots',
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
-	SELECT pg_drop_replication_slot('to_be_dropped');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT pg_drop_replication_slot('test_slot2');
 ]);
-$old_node->stop;
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,        '--include-logical-replication-slots'
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#75

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#74)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for giving comments! PSA new version patchset.
3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.
I considered again here. At least WALAVAIL_UNRESERVED should be supported because
the slot is still usable. It can return reserved or extended.

As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
to the document.

This feature re-create slots which have same name/plugins as old ones, not replicate
its state. So if we copy them as-is slots become usable again. If subscribers refer
the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
may be lost.

Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
not have a way to control that. the status is calculated by using restart_lsn,
but there are no function to modify directly.

One approach is adding an SQL funciton which set restart_lsn to aritrary value
(or 0/0, invalid), but it seems dangerous.

I see your point related to WALAVAIL_REMOVED status of the slot but
did you test the scenario I have explained in my comment? Basically, I
want to know whether it can impact the user in some way. So, please
check whether the corresponding subscriptions will be allowed to drop.
You can test it both before and after the upgrade.

--
With Regards,
Amit Kapila.

#76

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#75)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I see your point related to WALAVAIL_REMOVED status of the slot but
did you test the scenario I have explained in my comment? Basically, I
want to know whether it can impact the user in some way. So, please
check whether the corresponding subscriptions will be allowed to drop.
You can test it both before and after the upgrade.

Yeah, this is a real issue. I have tested and confirmed the expected things.
Even if the status of the slot is 'lost', it may be needed for dropping
subscriptions properly.

* before upgrading, the subscription which refers the lost slot could be dropped
* after upgrading, the subscription could not be dropped as-is.
users must ALTER SUBSCRIPTION sub SET (slot_name = NONE);

Followings are the stepped what I did:

## Setup

1. constructed a logical replication system
2. disabled the subscriber once
3. consumed many WALs so that the status of slot became 'lost'

```
publisher=# SELECT slot_name, wal_status FROM pg_replication_slots ;
slot_name | wal_status
-----------+------------
sub | lost
(1 row)
```

# testcase a - try to drop sub. before upgrading

a-1. enabled the subscriber again.
At that time following messages are shown on subscriber log:
```
ERROR: could not start WAL streaming: ERROR: can no longer get changes from replication slot "sub"
DETAIL: This slot has been invalidated because it exceeded the maximum reserved size.
```

a-2. did DROP SUBSCRIPTION ...
a-3. succeeded.

```
subscriber=# DROP SUBSCRIPTION sub;
NOTICE: dropped replication slot "sub" on publisher
DROP SUBSCRIPTION
```

# testcase b - try to drop sub. after upgrading

b-1. did pg_upgrade command
b-2. enabled the subscriber. From that point an apply worker connected to new node...
b-3. did DROP SUBSCRIPTION ...
b-4. failed with the message:

```
subscriber=# DROP SUBSCRIPTION sub;
ERROR: could not drop replication slot "sub" on publisher: ERROR: replication slot "sub" does not exist
```

The workaround was to disassociate the slot, which was written in the document.

```
subscriber =# ALTER SUBSCRIPTION sub DISABLE;
ALTER SUBSCRIPTION
subscriber =# ALTER SUBSCRIPTION sub SET (slot_name = NONE);
ALTER SUBSCRIPTION
subscriber =# DROP SUBSCRIPTION sub;
DROP SUBSCRIPTION
```

PSA the script for emulating above tests.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#77

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#74)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for giving comments! PSA new version patchset.

1. Do we really need 0001 patch after the latest change proposed by
Vignesh in the 0004 patch?

I removed 0001 patch and revived old patch which serializes slots at shutdown.
This is because the problem which slots are not serialized to disk still remain [1]
and then confirmed_flush becomes behind, even if we implement the approach.

So, IIUC, you are talking about a patch with the below commit message.
[PATCH v18 2/4] Always persist to disk logical slots during a
shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty. It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

As per this commit message, this patch should be numbered as 1 but you
have placed it as 2 after the main upgrade patch?

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?
Regarding the --binary-upgrade, the motivation is same as you expected. I covered
up the --logical-replication-slots-only option from users, so it should not be
used not for upgrade. Additionaly, this option is not shown in help and document.

As for -{data|schema}-only options, I removed restrictions.
Firstly I set as excluded because it may be confused - as discussed at [2], slots
must be dumped after all the pg_resetwal is done and at that time all the definitions
are already dumped. to avoid duplicated definitions, we must ensure only slots are
written in the output file. I thought this requirement contradict descirptions of
these options (Dump only the A, not B).
But after considering more, I thought this might not be needed because it was not
opened to users - no one would be confused by using both them.
(Restriction for -c is also removed for the same motivation)

I see inconsistent behavior here with the patch. If I use "pg_dump.exe
--schema-only --logical-replication-slots-only --binary-upgrade
postgres" then I get only a dump of slots without any schema. When I
use "pg_dump.exe --data-only --logical-replication-slots-only
--binary-upgrade postgres" then neither table data nor slots. When I
use "pg_dump.exe --create --logical-replication-slots-only
--binary-upgrade postgres" then it returns the error "pg_dump: error:
role with OID 10 does not exist".

Now, I tried using --binary-upgrade with some other option like
"pg_dump.exe --create --binary-upgrade postgres" and then I got a dump
with all required objects with support for binary-upgrade.

I think your thought here is that this new option won't be usable
directly with pg_dump but we should study whether we allow to support
other options with --binary-upgrade for in-place upgrade utilities
other than pg_upgrade.

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");
I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?
I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.

But I don't see how it addresses my concern about reading too many
records. If the confirmed_flush_lsn is too much behind, it will also
try to read all the remaining WAL for such slots.

But this function raise an ERROR when there is no valid record after the specified
lsn. This means that the pg_upgrade fails if logical slots has caught up the current
WAL location. IIUC DBA must do following steps:

1. shutdown old publisher
2. disable the subscription once <- this is mandatory, otherwise the walsender may
send the record during the upgrade and confirmed_lsn may point the SHUTDOWN_CHECKPOINT
3. do pg_upgrade <- pg_get_wal_record_content() may raise an ERROR if 2. was skipped

But we have already seen that we write shutdown_checkpoint record only
after logical walsender is shut down. So, how above is possible?

--
With Regards,
Amit Kapila.

#78

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#74)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.
I considered again here. At least WALAVAIL_UNRESERVED should be supported because
the slot is still usable. It can return reserved or extended.

As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
to the document.

This feature re-create slots which have same name/plugins as old ones, not replicate
its state. So if we copy them as-is slots become usable again. If subscribers refer
the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
may be lost.

Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
not have a way to control that. the status is calculated by using restart_lsn,
but there are no function to modify directly.

One approach is adding an SQL funciton which set restart_lsn to aritrary value
(or 0/0, invalid), but it seems dangerous.

So, we have three options here (a) As you have done in the patch,
document this limitation and request user to perform some manual steps
to drop the subscription; (b) don't allow upgrade to proceed if there
are invalid slots in the old cluster; (c) provide a new function like
pg_copy_logical_replication_slot_contents() where we copy the required
contents like invalid status(ReplicationSlotInvalidationCause), etc.

Personally, I would prefer (b) because it will minimize the steps
required to perform by the user after the upgrade and looks cleaner
solution.

Thoughts?

--
With Regards,
Amit Kapila.

#79

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#78)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

So, we have three options here (a) As you have done in the patch,
document this limitation and request user to perform some manual steps
to drop the subscription; (b) don't allow upgrade to proceed if there
are invalid slots in the old cluster; (c) provide a new function like
pg_copy_logical_replication_slot_contents() where we copy the required
contents like invalid status(ReplicationSlotInvalidationCause), etc.

Personally, I would prefer (b) because it will minimize the steps
required to perform by the user after the upgrade and looks cleaner
solution.

Thoughts?

Thanks for suggestion. I agreed (b) was better because it did not endanger users
for data lost. I implemented locally and worked well, so I'm planning to adopt
the idea in next version, if no objections.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#80

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#74)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");
I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?
I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.

IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record. Therefore, I think it's quite
possible that slot's confirmed_flush_lsn points to previous
CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
disabled after the publisher shut down and then some changes are made
on the publisher. We might want to add that check too but it would not
work. Because some WAL records could be written (e.g., by autovacuums)
during pg_upgrade before checking the slot's confirmed_flush_lsn.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#81

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#80)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");
I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?
I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.
IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record.

Yeah, I also think there should be some way to ensure this. How about
passing the number of records to read to this API? Actually, that will
address my other concern as well where the current API can lead to
reading an unbounded number of records if the confirmed_flush_lsn
location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
ideas to address it?

Therefore, I think it's quite
possible that slot's confirmed_flush_lsn points to previous
CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
disabled after the publisher shut down and then some changes are made
on the publisher. We might want to add that check too but it would not
work. Because some WAL records could be written (e.g., by autovacuums)
during pg_upgrade before checking the slot's confirmed_flush_lsn.

I think autovacuum is not enabled during the upgrade. See comment "Use
-b to disable autovacuum." in start_postmaster(). However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity. I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

--
With Regards,
Amit Kapila.

#82

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Jonathan S. Katz (#71)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 2, 2023 at 7:46 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:

Can I take this a step further on the user interface and ask why the
flag would be "--include-logical-replication-slots" vs. being enabled by
default?

Are there reasons why we wouldn't enable this feature by default on
pg_upgrade, and instead (if need be) have a flag that would be
"--exclude-logical-replication-slots"? Right now, not having the ability
to run pg_upgrade with logical replication slots enabled on the
publisher is a a very big pain point for users, so I would strongly
recommend against adding friction unless there is a very large challenge
with such an implementation.

Thanks for acknowledging the need/importance of this feature. I also
don't see a need to have such a flag for pg_upgrade. The only reason
why one might want to exclude slots is that they are not up to date
w.r.t WAL being consumed. For example, one has not consumed all the
WAL from manually created slots or say some subscription has been
disabled before shutdown. I guess in those cases we should give an
error to the user and ask to remove such slots before the upgrade
because anyway, those won't be usable after the upgrade.

Having said that, I think we need a flag for pg_dump to dump the slots.

--
With Regards,
Amit Kapila.

#83

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Amit Kapila (#81)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 07, 2023 at 09:24:02AM +0530, Amit Kapila wrote:

I think autovacuum is not enabled during the upgrade. See comment "Use
-b to disable autovacuum." in start_postmaster(). However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity. I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned. If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

#84

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Julien Rouhaud (#83)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Aug 07, 2023 at 09:24:02AM +0530, Amit Kapila wrote:

I think autovacuum is not enabled during the upgrade. See comment "Use
-b to disable autovacuum." in start_postmaster(). However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity. I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned. If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

Is it because such workers would write some WAL which slots may not
process? If so, I think it is equally dangerous as other problems that
can arise due to such a worker. Do you think of any special handling
here?

--
With Regards,
Amit Kapila.

#85

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Amit Kapila (#84)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned. If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

Is it because such workers would write some WAL which slots may not
process? If so, I think it is equally dangerous as other problems that
can arise due to such a worker. Do you think of any special handling
here?

Yes, and there were already multiple reports of multixact corruption due to
bgworker activity during pg_upgrade (see
/messages/by-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
for instance). I think we should once and for all fix this whole class of
problem one way or another.

#86

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#81)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");
I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?
I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.
IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record.
Yeah, I also think there should be some way to ensure this. How about
passing the number of records to read to this API? Actually, that will
address my other concern as well where the current API can lead to
reading an unbounded number of records if the confirmed_flush_lsn
location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
ideas to address it?

It makes sense to me to limit the number of WAL records to read. But
as I mentioned below, if there is a chance of any WAL activity during
the upgrade, I'm not sure what limit to set.

Therefore, I think it's quite
possible that slot's confirmed_flush_lsn points to previous
CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
disabled after the publisher shut down and then some changes are made
on the publisher. We might want to add that check too but it would not
work. Because some WAL records could be written (e.g., by autovacuums)
during pg_upgrade before checking the slot's confirmed_flush_lsn.

I think autovacuum is not enabled during the upgrade. See comment "Use
-b to disable autovacuum." in start_postmaster().

Right, thanks.

However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity.

WAL records for hint bit updates could be generated even in upgrading mode?

I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#87

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#86)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 7, 2023 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record.

Yeah, I also think there should be some way to ensure this. How about
passing the number of records to read to this API? Actually, that will
address my other concern as well where the current API can lead to
reading an unbounded number of records if the confirmed_flush_lsn
location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
ideas to address it?

It makes sense to me to limit the number of WAL records to read. But
as I mentioned below, if there is a chance of any WAL activity during
the upgrade, I'm not sure what limit to set.

In that case, we won't be able to pass the number of records. We need
to check based on the type of records.

However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity.

WAL records for hint bit updates could be generated even in upgrading mode?

Do you mean these records can be generated during reading catalog tables?

I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

Right, as said earlier, in that case, we need to rely on the type of records.

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

You are right and I have mentioned the same case today in my response
to Jonathan but do you have better ideas to deal with such slots than
to give an ERROR?

--
With Regards,
Amit Kapila.

#88

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Julien Rouhaud (#85)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 1:06 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

They won't be copied but I think that won't be any different than
other objects like tables. Anyway, I have another idea which is to not
allow creating slots during binary upgrade unless one specifically
requests it by having an API like binary_upgrade_allow_slot_create()
similar to existing APIs binary_upgrade_*.

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned. If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

Is it because such workers would write some WAL which slots may not
process? If so, I think it is equally dangerous as other problems that
can arise due to such a worker. Do you think of any special handling
here?

Yes, and there were already multiple reports of multixact corruption due to
bgworker activity during pg_upgrade (see
/messages/by-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
for instance). I think we should once and for all fix this whole class of
problem one way or another.

I don't object to doing something like we discussed in the thread you
linked but don't see the link with this work. Surely, the extra
WAL/XIDs generated during the upgrade will cause data inconsistency
which is no different after this patch.

--
With Regards,
Amit Kapila.

#89

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#88)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit, Julien,

Unless I'm missing something I don't see what prevents something to

connect

using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

They won't be copied but I think that won't be any different than
other objects like tables. Anyway, I have another idea which is to not
allow creating slots during binary upgrade unless one specifically
requests it by having an API like binary_upgrade_allow_slot_create()
similar to existing APIs binary_upgrade_*.

I confirmed the part and confirmed that objects created after the dump
were not copied to new node. PSA scripts to emulate my test.

# tested steps

-1. applied v18 patch set
0. modified source to create objects during upgrade and install:

```
@@ -188,6 +188,9 @@ check_and_dump_old_cluster(bool live_check)
if (!user_opts.check)
generate_old_dump();

+       printf("XXX: start to sleep\n");
+       sleep(35);
+
```

1. prepared a node which had a replication slot
2. did pg_upgrade, the process will sleep 35 seconds during that
3. connected to the in-upgrading node by the command:

```
psql "host=`pwd` user=postgres port=50432 replication=database"
```

4. created a table and replication slot. Note that for binary upgrade, it was very
hard to create tables manually. For me, table "bar" and slot "test" were created.
5. waited until the upgrade and boot new node.
6. confirmed that created tables and slots were not found on new node.

```
new_publisher=# \d
Did not find any relations.

new_publisher=# SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'test';
slot_name
-----------
(0 rows)
```

You can execute test_01.sh first, and then execute test_02.sh while the first terminal is stuck.

Note that such creations are theoretically occurred, but it is very rare.
By followings line in start_postmaster(), the TCP/IP connections are refused and
only the superuser can connect to the server.

```
#if !defined(WIN32)
/* prevent TCP/IP connections, restrict socket access */
strcat(socket_string,
" -c listen_addresses='' -c unix_socket_permissions=0700");

/* Have a sockdir? Tell the postmaster. */
if (cluster->sockdir)
snprintf(socket_string + strlen(socket_string),
sizeof(socket_string) - strlen(socket_string),
" -c %s='%s'",
(GET_MAJOR_VERSION(cluster->major_version) <= 902) ?
"unix_socket_directory" : "unix_socket_directories",
cluster->sockdir);
#endif
```

Moreover, the socket directory is set to current dir of caller, and port number
is also different from setting written in postgresql.conf.
I think there are few chances that replication slots are accidentally created
during the replication slot.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#90

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#87)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 7, 2023 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record.

Yeah, I also think there should be some way to ensure this. How about
passing the number of records to read to this API? Actually, that will
address my other concern as well where the current API can lead to
reading an unbounded number of records if the confirmed_flush_lsn
location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
ideas to address it?

It makes sense to me to limit the number of WAL records to read. But
as I mentioned below, if there is a chance of any WAL activity during
the upgrade, I'm not sure what limit to set.

In that case, we won't be able to pass the number of records. We need
to check based on the type of records.

However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity.

WAL records for hint bit updates could be generated even in upgrading mode?

Do you mean these records can be generated during reading catalog tables?

Yes.

I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

Right, as said earlier, in that case, we need to rely on the type of records.

Another idea would be that before starting the old cluster we check if
the slot's confirmed_flush_lsn in the slot state file matches the
latest checkpoint LSN got by pg_controlfile. We need another tool to
dump the slot state file, though.

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

You are right and I have mentioned the same case today in my response
to Jonathan but do you have better ideas to deal with such slots than
to give an ERROR?

It makes sense to me to give an ERROR for such slots but does it also
make sense to make the check optional?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#91

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#90)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 7, 2023 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

WAL records for hint bit updates could be generated even in upgrading mode?

Do you mean these records can be generated during reading catalog tables?

Yes.

BTW, Kuroda-San has verified and found that three types of records
(including XLOG_FPI_FOR_HINT) can be generated by the system during
the upgrade. See email [1]/messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

Right, as said earlier, in that case, we need to rely on the type of records.

Another idea would be that before starting the old cluster we check if
the slot's confirmed_flush_lsn in the slot state file matches the
latest checkpoint LSN got by pg_controlfile. We need another tool to
dump the slot state file, though.

I feel it would be a good idea to provide such a tool for users to
avoid getting errors during upgrade but I think the upgrade code still
needs to ensure that there are no WAL records between
confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
want to say that we don't do any verification check during the upgrade
and let the data loss happens if the user didn't ensure that by
running such a tool?

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

You are right and I have mentioned the same case today in my response
to Jonathan but do you have better ideas to deal with such slots than
to give an ERROR?

It makes sense to me to give an ERROR for such slots but does it also
make sense to make the check optional?

We can do that if we think so. We have two ways to make this check
optional (a) have a switch like --include-logical-replication-slots as
the proposed patch has which means by default we won't try to upgrade
slots; (b) have a switch like --exclude-logical-replication-slots as
Jonathan proposed which means we will exclude slots only if specified
by user. Now, one thing to note is that we don't seem to have any
include/exclude switch in the upgrade which I think indicates users by
default prefer to upgrade everything. Now, even if we decide not to
give any switch initially but do it only if there is a user demand for
it then also users will have a way to proceed with an upgrade which is
by dropping such slots. Do you have any preference?

[1]: /messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#92

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#91)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 7, 2023 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

WAL records for hint bit updates could be generated even in upgrading mode?

Do you mean these records can be generated during reading catalog tables?

Yes.

BTW, Kuroda-San has verified and found that three types of records
(including XLOG_FPI_FOR_HINT) can be generated by the system during
the upgrade. See email [1].

I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

Right, as said earlier, in that case, we need to rely on the type of records.

Another idea would be that before starting the old cluster we check if
the slot's confirmed_flush_lsn in the slot state file matches the
latest checkpoint LSN got by pg_controlfile. We need another tool to
dump the slot state file, though.

I feel it would be a good idea to provide such a tool for users to
avoid getting errors during upgrade but I think the upgrade code still
needs to ensure that there are no WAL records between
confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
want to say that we don't do any verification check during the upgrade
and let the data loss happens if the user didn't ensure that by
running such a tool?

I meant that if we can check the slot state file while the old cluster
stops, we can ensure there are no WAL records between slot's
confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
the control file).

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

You are right and I have mentioned the same case today in my response
to Jonathan but do you have better ideas to deal with such slots than
to give an ERROR?

It makes sense to me to give an ERROR for such slots but does it also
make sense to make the check optional?

We can do that if we think so. We have two ways to make this check
optional (a) have a switch like --include-logical-replication-slots as
the proposed patch has which means by default we won't try to upgrade
slots; (b) have a switch like --exclude-logical-replication-slots as
Jonathan proposed which means we will exclude slots only if specified
by user. Now, one thing to note is that we don't seem to have any
include/exclude switch in the upgrade which I think indicates users by
default prefer to upgrade everything. Now, even if we decide not to
give any switch initially but do it only if there is a user demand for
it then also users will have a way to proceed with an upgrade which is
by dropping such slots. Do you have any preference?

TBH I'm not sure if there is a use case where the user wants to
exclude replication slots during the upgrade. Including replication
slots by default seems to be better to me, at least for now. I
initially thought asking for users to drop replication slots that
possibly have not consumed all WAL records would not be a good idea,
but since we already do such things in check.c I now think it would
not be a problem. I guess it would be great if we can check WAL
records between slots' confimed_flush_lsn and the latest LSN, and if
there are no meaningful WAL records there we can upgrade the
replication slots.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#93

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#92)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 6:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I feel it would be a good idea to provide such a tool for users to
avoid getting errors during upgrade but I think the upgrade code still
needs to ensure that there are no WAL records between
confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
want to say that we don't do any verification check during the upgrade
and let the data loss happens if the user didn't ensure that by
running such a tool?

I meant that if we can check the slot state file while the old cluster
stops, we can ensure there are no WAL records between slot's
confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
the control file).

Are you suggesting doing this before we start the old cluster or after
we stop the old cluster? I was thinking about the pros and cons of
doing this check when the server is 'on' (along with other upgrade
checks something like the patch is doing now) versus when the server
is 'off'. I think the advantage of doing it when the server is 'off'
(after check_and_dump_old_cluster()) is that it will be ensured that
there is no extra WAL that could be generated during the upgrade and
has not been verified against confirmed_flush_lsn location. But OTOH,
to retrieve slot information when the server is 'off', we need a
separate utility or probably a functionality for the same in
pg_upgrade and also some WAL reading stuff which sounds to me like a
larger change that may not be warranted here. I think anyway the extra
WAL (if any got generated during the upgrade) won't be required after
the upgrade so not convinced to make such a check while the server is
'off'. Are there reasons which make it better to do this while the old
cluster is 'off'?

We can do that if we think so. We have two ways to make this check
optional (a) have a switch like --include-logical-replication-slots as
the proposed patch has which means by default we won't try to upgrade
slots; (b) have a switch like --exclude-logical-replication-slots as
Jonathan proposed which means we will exclude slots only if specified
by user. Now, one thing to note is that we don't seem to have any
include/exclude switch in the upgrade which I think indicates users by
default prefer to upgrade everything. Now, even if we decide not to
give any switch initially but do it only if there is a user demand for
it then also users will have a way to proceed with an upgrade which is
by dropping such slots. Do you have any preference?

TBH I'm not sure if there is a use case where the user wants to
exclude replication slots during the upgrade. Including replication
slots by default seems to be better to me, at least for now. I
initially thought asking for users to drop replication slots that
possibly have not consumed all WAL records would not be a good idea,
but since we already do such things in check.c I now think it would
not be a problem. I guess it would be great if we can check WAL
records between slots' confimed_flush_lsn and the latest LSN, and if
there are no meaningful WAL records there we can upgrade the
replication slots.

Agreed.

--
With Regards,
Amit Kapila.

#94

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Amit Kapila (#88)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 7, 2023 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 1:06 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

They won't be copied but I think that won't be any different than
other objects like tables. Anyway, I have another idea which is to not
allow creating slots during binary upgrade unless one specifically
requests it by having an API like binary_upgrade_allow_slot_create()
similar to existing APIs binary_upgrade_*.

Sawada-San, Julien, and others, do you have any thoughts on the above point?

--
With Regards,
Amit Kapila.

#95

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#94)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 7, 2023 at 1:06 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

They won't be copied but I think that won't be any different than
other objects like tables. Anyway, I have another idea which is to not
allow creating slots during binary upgrade unless one specifically
requests it by having an API like binary_upgrade_allow_slot_create()
similar to existing APIs binary_upgrade_*.

Sawada-San, Julien, and others, do you have any thoughts on the above point?

IIUC during the old cluster running in the middle of pg_upgrade it
doesn't accept TCP connections. I'm not sure we need to worry about
the case where someone in the same server attempts to create
replication slots during the upgrade. The same is true for other
objects, as Amit mentioned.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#96

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#93)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 10, 2023 at 6:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I feel it would be a good idea to provide such a tool for users to
avoid getting errors during upgrade but I think the upgrade code still
needs to ensure that there are no WAL records between
confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
want to say that we don't do any verification check during the upgrade
and let the data loss happens if the user didn't ensure that by
running such a tool?

I meant that if we can check the slot state file while the old cluster
stops, we can ensure there are no WAL records between slot's
confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
the control file).

Are you suggesting doing this before we start the old cluster or after
we stop the old cluster? I was thinking about the pros and cons of
doing this check when the server is 'on' (along with other upgrade
checks something like the patch is doing now) versus when the server
is 'off'. I think the advantage of doing it when the server is 'off'
(after check_and_dump_old_cluster()) is that it will be ensured that
there is no extra WAL that could be generated during the upgrade and
has not been verified against confirmed_flush_lsn location. But OTOH,
to retrieve slot information when the server is 'off', we need a
separate utility or probably a functionality for the same in
pg_upgrade and also some WAL reading stuff which sounds to me like a
larger change that may not be warranted here. I think anyway the extra
WAL (if any got generated during the upgrade) won't be required after
the upgrade so not convinced to make such a check while the server is
'off'. Are there reasons which make it better to do this while the old
cluster is 'off'?

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though. After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?). Extra WAL records could be generated
only after this check, so we wouldn't need to worry about that for
slots for logical replication. As for non-logical replication slots,
we would need some WAL reading stuff, but I'm not sure we need it for
the first commit. Or another idea would be to allow users to mark
replication slots "upgradable" so that pg_upgrade skips the
confirmed_flush_lsn check.

BTW this check would not be able to support live-check but I think
it's not a problem as this check with a running server will never be
able to pass.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#97

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#74)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

Based on recent discussions, I updated the patch set. I did not reply one by one
because there are many posts, but thank you for giving many suggestion!

Followings shows what I changed.

1.
This feature is now enabled by default. Instead "--exclude-logical-replication-slots"
was added. (Per suggestions like [1]/messages/by-id/ad83b9f2-ced3-c51c-342a-cc281ff562fc@postgresql.org)

2.
Pg_upgrade raises ERROR when some slots are 'WALAVAIL_REMOVED'. (Per discussion[2]/messages/by-id/CAA4eK1+8btsYhNQvw6QJ4iTw1wFhkFXXABT=ED1eHFvtekRanQ@mail.gmail.com)

3.
Slots which are 'WALAVAIL_UNRESERVED' are dumped and restored. (Per consideration[3]/messages/by-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA@TYAPR01MB5866.jpnprd01.prod.outlook.com)

4.
Combination --logical-replication-slots-only and other --only options was
prohibit again. (Per suggestion[4]/messages/by-id/CAA4eK1+CD82Kssy+iqpETPKYUh9AmNORF+3iGfNXgxKxqL3T6g@mail.gmail.com) Currently --data-only and --schema-only
could not be used together, so I followed the same style. Additionally, it's not
easy for user to predict the behavior if specifying many --only command.

5.
Fixed some bugs related with combinations of options. E.g., v18 did not allow to
use "--create", but now it could use same time. This was because information
of role did not get from node while doing slot dump.

6.
The ordering of patches was changed. The patch "Always persist to disk..."
became 0001. (Per suggestion [4]/messages/by-id/CAA4eK1+CD82Kssy+iqpETPKYUh9AmNORF+3iGfNXgxKxqL3T6g@mail.gmail.com)

7.
Functions for checking were changed (per [5]/messages/by-id/CAD21AoC4D4wYTcLM8T-rAv=pO5kS6ffcVD1e7h4eFERT4+fwQQ@mail.gmail.com). Currently WALs between
confirmed_lsn and current location is scanned and confirmed. The requirements
are little hacky:

* The first record after the confirmed_lsn must be SHUTDOWN_CHECKPOINT
* Other records till current position must be either RUNNING_XACT,
CHECKPOINT_ONLINE or XLOG_FPI_FOR_HINT.

In the checking function (validate_wal_record_types_after), WALs are read
repeatedly and confirmed its type. v18 required to change the version number
for pg_walinspect, it is not needed anymore.

[1]: /messages/by-id/ad83b9f2-ced3-c51c-342a-cc281ff562fc@postgresql.org
[2]: /messages/by-id/CAA4eK1+8btsYhNQvw6QJ4iTw1wFhkFXXABT=ED1eHFvtekRanQ@mail.gmail.com
[3]: /messages/by-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA@TYAPR01MB5866.jpnprd01.prod.outlook.com
[4]: /messages/by-id/CAA4eK1+CD82Kssy+iqpETPKYUh9AmNORF+3iGfNXgxKxqL3T6g@mail.gmail.com
[5]: /messages/by-id/CAD21AoC4D4wYTcLM8T-rAv=pO5kS6ffcVD1e7h4eFERT4+fwQQ@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v19-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v19-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 80213431a8f6c4a545a8de9cd95c2867d8ae78d9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v19 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..5aed7cd190 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v19-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v19-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From dca74c4db545a2fd5b3145203759fdea4e14b90f Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v19 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. The
commit can be divided into two parts: one for pg_dump and another for
pg_upgrade.

For pg_dump this commit includes a new option called "--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file that executes pg_create_logical_replication_slot()
with the extracted parameters is generated.

For pg_upgrade, it executes pg_dump with the new "--logical-replication-slots-only"
option and restores the slots using the pg_create_logical_replication_slots()
statements that the dump generated (see above). Note that we cannot dump
replication slots at the same time as the schema dump because we need to
separate the timing of restoring replication slots and other objects.
Replication slots, in  particular, should not be restored before executing the
pg_resetwal command because it will remove WALs that are required by the slots.

The option "--exclude-logical-replication-slots" is also added to pg_upgrade.
When it is specified, all the replication slots on old node are ignored, which
is the same behaivor as previous versions.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  42 ++++
 src/bin/pg_dump/pg_backup.h                   |   1 +
 src/bin/pg_dump/pg_dump.c                     | 225 +++++++++++++++---
 src/bin/pg_dump/pg_dump.h                     |  14 +-
 src/bin/pg_dump/pg_dump_sort.c                |  11 +-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  77 ++++++
 src/bin/pg_upgrade/dump.c                     |  24 ++
 src/bin/pg_upgrade/info.c                     | 111 ++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/option.c                   |   7 +
 src/bin/pg_upgrade/pg_upgrade.c               |  61 +++++
 src/bin/pg_upgrade/pg_upgrade.h               |  22 ++
 .../t/003_logical_replication_slots.pl        | 152 ++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 15 files changed, 713 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..df33abc07d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,16 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--exclude-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Do not upgrade logical replicaiton slots. By default <application>pg_upgrade</application>
+        try to replicate them.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
@@ -402,6 +412,38 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     By default <application>pg_upgrade</application> try to dump and restore
+     logical replication slots. This helps avoid the need for manually defining
+     the same replication slot on the new publisher. If you want to do it by
+     hands or not to copy slots, <option>--exclude-logical-replication-slots</option>
+     must be specified.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
+
+    <para>
+     Upgrading slots has some settings. At first, all the slots must not be in
+     <literal>lost</literal>, and they must have consumed all the WALs on old
+     node. Furthermore, new node must have larger
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     than existing slots on old node, and
+     <link linkend="guc-wal-level"><varname>wal_level</varname></link> must be
+     <literal>logical</literal>. <application>pg_upgrade</application> will
+     run error if something wrong.
+    </para>
+   </step>
+
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..0a4e931f9b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -187,6 +187,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			logical_slots_only;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..1a9af67139 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -328,6 +328,9 @@ static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
 static bool forcePartitionRootLoad(const TableInfo *tbinfo);
 
+static void getLogicalReplicationSlots(Archive *fout);
+static void dumpLogicalReplicationSlot(Archive *fout,
+									   const LogicalReplicationSlotInfo *slotinfo);
 
 int
 main(int argc, char **argv)
@@ -431,6 +434,7 @@ main(int argc, char **argv)
 		{"table-and-children", required_argument, NULL, 12},
 		{"exclude-table-and-children", required_argument, NULL, 13},
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
+		{"logical-replication-slots-only", no_argument, NULL, 15},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -657,6 +661,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 15:			/* dump only replication slot(s) */
+				dopt.logical_slots_only = true;
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -714,6 +722,18 @@ main(int argc, char **argv)
 	if (dopt.do_nothing && dopt.dump_inserts == 0)
 		pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
 
+	if (dopt.logical_slots_only)
+	{
+		if (!dopt.binary_upgrade)
+			pg_fatal("options --logical-replication-slots-only requires option --binary-upgrade");
+
+		if (dopt.dataOnly)
+			pg_fatal("options --logical-replication-slots-only and -a/--data-only cannot be used together");
+
+		if (dopt.schemaOnly)
+			pg_fatal("options --logical-replication-slots-only and -s/--schema-only cannot be used together");
+	}
+
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
 
@@ -876,6 +896,13 @@ main(int argc, char **argv)
 			pg_fatal("no matching extensions were found");
 	}
 
+	/*
+	 * If dump logical-replication-slots-only was requested, dump only them
+	 * and skip everything else.
+	 */
+	if (dopt.logical_slots_only)
+		getLogicalReplicationSlots(fout);
+
 	/*
 	 * Dumping LOs is the default for dumps where an inclusion switch is not
 	 * used (an "include everything" dump).  -B can be used to exclude LOs
@@ -893,48 +920,52 @@ main(int argc, char **argv)
 	 */
 	collectRoleNames(fout);
 
-	/*
-	 * Now scan the database and create DumpableObject structs for all the
-	 * objects we intend to dump.
-	 */
-	tblinfo = getSchemaData(fout, &numTables);
-
-	if (!dopt.schemaOnly)
+	if (!dopt.logical_slots_only)
 	{
-		getTableData(&dopt, tblinfo, numTables, 0);
-		buildMatViewRefreshDependencies(fout);
-		if (dopt.dataOnly)
-			getTableDataFKConstraints();
-	}
+		/*
+		 * Now scan the database and create DumpableObject structs for all the
+		 * objects we intend to dump.
+		 */
+		tblinfo = getSchemaData(fout, &numTables);
 
-	if (dopt.schemaOnly && dopt.sequence_data)
-		getTableData(&dopt, tblinfo, numTables, RELKIND_SEQUENCE);
+		if (!dopt.schemaOnly)
+		{
+			getTableData(&dopt, tblinfo, numTables, 0);
+			buildMatViewRefreshDependencies(fout);
+			if (dopt.dataOnly)
+				getTableDataFKConstraints();
+		}
 
-	/*
-	 * In binary-upgrade mode, we do not have to worry about the actual LO
-	 * data or the associated metadata that resides in the pg_largeobject and
-	 * pg_largeobject_metadata tables, respectively.
-	 *
-	 * However, we do need to collect LO information as there may be comments
-	 * or other information on LOs that we do need to dump out.
-	 */
-	if (dopt.outputLOs || dopt.binary_upgrade)
-		getLOs(fout);
+		if (dopt.schemaOnly && dopt.sequence_data)
+			getTableData(&dopt, tblinfo, numTables, RELKIND_SEQUENCE);
 
-	/*
-	 * Collect dependency data to assist in ordering the objects.
-	 */
-	getDependencies(fout);
+		/*
+		 * In binary-upgrade mode, we do not have to worry about the actual LO
+		 * data or the associated metadata that resides in the pg_largeobject and
+		 * pg_largeobject_metadata tables, respectively.
+		 *
+		 * However, we do need to collect LO information as there may be comments
+		 * or other information on LOs that we do need to dump out.
+		 */
+		if (dopt.outputLOs || dopt.binary_upgrade)
+			getLOs(fout);
 
-	/*
-	 * Collect ACLs, comments, and security labels, if wanted.
-	 */
-	if (!dopt.aclsSkip)
-		getAdditionalACLs(fout);
-	if (!dopt.no_comments)
-		collectComments(fout);
-	if (!dopt.no_security_labels)
-		collectSecLabels(fout);
+		/*
+		 * Collect dependency data to assist in ordering the objects.
+		 */
+		getDependencies(fout);
+
+		/*
+		 * Collect ACLs, comments, and security labels, if wanted.
+		 */
+		if (!dopt.aclsSkip)
+			getAdditionalACLs(fout);
+		if (!dopt.no_comments)
+			collectComments(fout);
+		if (!dopt.no_security_labels)
+			collectSecLabels(fout);
+
+	}
 
 	/* Lastly, create dummy objects to represent the section boundaries */
 	boundaryObjs = createBoundaryObjects();
@@ -1109,6 +1140,12 @@ help(const char *progname)
 			 "                               servers matching PATTERN\n"));
 	printf(_("  --inserts                    dump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-root    load partitions via the root table\n"));
+
+	/*
+	 * The option --logical-replication-slots-only is used only by pg_upgrade
+	 * and should not be called by users, which is why it is not exposed by
+	 * the help.
+	 */
 	printf(_("  --no-comments                do not dump comments\n"));
 	printf(_("  --no-publications            do not dump publications\n"));
 	printf(_("  --no-security-labels         do not dump security label assignments\n"));
@@ -10237,6 +10274,10 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			dumpLogicalReplicationSlot(fout,
+									   (const LogicalReplicationSlotInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -18218,6 +18259,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 			case DO_PUBLICATION_REL:
 			case DO_PUBLICATION_TABLE_IN_SCHEMA:
 			case DO_SUBSCRIPTION:
+			case DO_LOGICAL_REPLICATION_SLOT:
 				/* Post-data objects: must come after the post-data boundary */
 				addObjectDependency(dobj, postDataBound->dumpId);
 				break;
@@ -18479,3 +18521,112 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse %s array", "reloptions");
 }
+
+/*
+ * getLogicalReplicationSlots
+ *	  get information about replication slots
+ */
+static void
+getLogicalReplicationSlots(Archive *fout)
+{
+	PGresult   *res;
+	LogicalReplicationSlotInfo *slotinfo;
+	PQExpBuffer query;
+
+	int			i_slotname;
+	int			i_plugin;
+	int			i_twophase;
+	int			i,
+				ntups;
+
+	/* Check whether we should dump or not */
+	if (fout->remoteVersion < 170000)
+		return;
+
+	Assert(fout->dopt->logical_slots_only);
+
+	query = createPQExpBuffer();
+
+	resetPQExpBuffer(query);
+
+	/*
+	 * Get replication slots.
+	 *
+	 * XXX: Which information must be extracted from old node? Currently three
+	 * attributes are extracted because they are used by
+	 * pg_create_logical_replication_slot().
+	 */
+	appendPQExpBufferStr(query,
+						 "SELECT slot_name, plugin, two_phase "
+						 "FROM pg_catalog.pg_replication_slots "
+						 "WHERE database = current_database() AND temporary = false "
+						 "AND wal_status IN ('reserved', 'extended');");
+
+	res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+	ntups = PQntuples(res);
+
+	i_slotname = PQfnumber(res, "slot_name");
+	i_plugin = PQfnumber(res, "plugin");
+	i_twophase = PQfnumber(res, "two_phase");
+
+	slotinfo = pg_malloc(ntups * sizeof(LogicalReplicationSlotInfo));
+
+	for (i = 0; i < ntups; i++)
+	{
+		slotinfo[i].dobj.objType = DO_LOGICAL_REPLICATION_SLOT;
+
+		slotinfo[i].dobj.catId.tableoid = InvalidOid;
+		slotinfo[i].dobj.catId.oid = InvalidOid;
+		AssignDumpId(&slotinfo[i].dobj);
+
+		slotinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_slotname));
+
+		slotinfo[i].plugin = pg_strdup(PQgetvalue(res, i, i_plugin));
+		slotinfo[i].twophase = (strcmp(PQgetvalue(res, i, i_twophase), "t") == 0);
+
+		/*
+		 * Note: Currently we do not have any options to include/exclude slots
+		 * in dumping, so all the slots must be selected.
+		 */
+		slotinfo[i].dobj.dump = DUMP_COMPONENT_DEFINITION;
+	}
+	PQclear(res);
+
+	destroyPQExpBuffer(query);
+}
+
+/*
+ * dumpLogicalReplicationSlot
+ *	  dump creation functions for the given logical replication slots
+ */
+static void
+dumpLogicalReplicationSlot(Archive *fout,
+						   const LogicalReplicationSlotInfo *slotinfo)
+{
+	Assert(fout->dopt->logical_slots_only);
+
+	if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+	{
+		PQExpBuffer query = createPQExpBuffer();
+
+		/*
+		 * XXX: For simplification, pg_create_logical_replication_slot() is
+		 * used. Is it sufficient?
+		 */
+		appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+		appendStringLiteralAH(query, slotinfo->dobj.name, fout);
+		appendPQExpBuffer(query, ", ");
+		appendStringLiteralAH(query, slotinfo->plugin, fout);
+		appendPQExpBuffer(query, ", false, %s);",
+						  slotinfo->twophase ? "true" : "false");
+
+		ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = slotinfo->dobj.name,
+								  .description = "REPLICATION SLOT",
+								  .section = SECTION_NONE,
+								  .createStmt = query->data));
+
+		destroyPQExpBuffer(query);
+	}
+}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bc8f2ec36d..ed1866d9ab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -82,7 +82,8 @@ typedef enum
 	DO_PUBLICATION,
 	DO_PUBLICATION_REL,
 	DO_PUBLICATION_TABLE_IN_SCHEMA,
-	DO_SUBSCRIPTION
+	DO_SUBSCRIPTION,
+	DO_LOGICAL_REPLICATION_SLOT
 } DumpableObjectType;
 
 /*
@@ -667,6 +668,17 @@ typedef struct _SubscriptionInfo
 	char	   *subpasswordrequired;
 } SubscriptionInfo;
 
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+	DumpableObject dobj;
+	char	   *plugin;
+	bool		twophase;
+} LogicalReplicationSlotInfo;
+
 /*
  *	common utility functions
  */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 523a19c155..ae65443228 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -93,6 +93,7 @@ enum dbObjectTypePriorities
 	PRIO_PUBLICATION_REL,
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,
 	PRIO_SUBSCRIPTION,
+	PRIO_LOGICAL_REPLICATION_SLOT,
 	PRIO_DEFAULT_ACL,			/* done in ACL pass */
 	PRIO_EVENT_TRIGGER,			/* must be next to last! */
 	PRIO_REFRESH_MATVIEW		/* must be last! */
@@ -146,10 +147,11 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_PUBLICATION_TABLE_IN_SCHEMA,	/* DO_PUBLICATION_TABLE_IN_SCHEMA */
-	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+	PRIO_SUBSCRIPTION,			/* DO_SUBSCRIPTION */
+	PRIO_LOGICAL_REPLICATION_SLOT	/* DO_LOGICAL_REPLICATION_SLOT */
 };
 
-StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
+StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_LOGICAL_REPLICATION_SLOT + 1),
 				 "array length mismatch");
 
 static DumpId preDataBoundId;
@@ -1542,6 +1544,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "SUBSCRIPTION (ID %d OID %u)",
 					 obj->dumpId, obj->catId.oid);
 			return;
+		case DO_LOGICAL_REPLICATION_SLOT:
+			snprintf(buf, bufsize,
+					 "LOGICAL REPLICATION SLOT (ID %d NAME %s)",
+					 obj->dumpId, obj->name);
+			return;
 		case DO_PRE_DATA_BOUNDARY:
 			snprintf(buf, bufsize,
 					 "PRE-DATA BOUNDARY  (ID %d)",
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..93da9e15f3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+int	num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,12 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	if (!user_opts.exclude_logical_slots)
+		num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+	else
+		num_slots_on_old_cluster = 0;
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +197,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Checking for logical slots must be done before
+	 * check_new_cluster_is_empty() because the slot_arr attribute of the
+	 * new_cluster will be checked in that function.
+	 */
+	if (!user_opts.exclude_logical_slots && num_slots_on_old_cluster)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -364,6 +383,21 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * Check the existence of logical replication slots.
+		 */
+		if (!user_opts.exclude_logical_slots)
+		{
+			DbInfo	   *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+			LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
+
+			/* if nslots > 0, report just first entry and exit */
+			if (slot_arr->nslots)
+				pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+						 pDbInfo->db_name,
+						 slot_arr->slots[0].slotname);
+		}
 	}
 }
 
@@ -1402,3 +1436,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* logical replication slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..433d530486 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -59,6 +59,30 @@ generate_old_dump(void)
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
+		/*
+		 * Dump logical replication slots.
+		 *
+		 * XXX We cannot dump replication slots at the same time as the schema
+		 * dump because we need to separate the timing of restoring
+		 * replication slots and other objects. Replication slots, in
+		 * particular, should not be restored before executing the pg_resetwal
+		 * command because it will remove WALs that are required by the slots.
+		 */
+		if (!user_opts.exclude_logical_slots)
+		{
+			char		slots_file_name[MAXPGPATH];
+
+			snprintf(slots_file_name, sizeof(slots_file_name),
+					 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+			parallel_exec_prog(log_file_name, NULL,
+							   "\"%s/pg_dump\" %s --logical-replication-slots-only "
+							   "--quote-all-identifiers --binary-upgrade %s "
+							   "--file=\"%s/%s\" %s",
+							   new_cluster.bindir, cluster_conn_opts(&old_cluster),
+							   log_opts.verbose ? "--verbose" : "",
+							   log_opts.dumpdir,
+							   slots_file_name, escaped_connstr.data);
+		}
 		termPQExpBuffer(&escaped_connstr);
 	}
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..8bc0ad2e10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,96 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +701,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +757,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 640361009e..3b1c7c9ddf 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,7 @@ parseCommandLine(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"clone", no_argument, NULL, 1},
 		{"copy", no_argument, NULL, 2},
+		{"exclude-logical-replication-slots", no_argument, NULL, 3},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -199,6 +200,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.transfer_mode = TRANSFER_MODE_COPY;
 				break;
 
+			case 3:
+				user_opts.exclude_logical_slots = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -289,6 +294,8 @@ usage(void)
 	printf(_("  -V, --version                 display version information, then exit\n"));
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
+	printf(_("  --exclude-logical-replication-slots\n"
+			 "                                do not upgrade logical replication slots\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
 			 "Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..00066fc4c1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (!user_opts.exclude_logical_slots && num_slots_on_old_cluster)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,50 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char		slots_file_name[MAXPGPATH],
+					log_file_name[MAXPGPATH];
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		snprintf(slots_file_name, sizeof(slots_file_name),
+				 DB_DUMP_LOGICAL_SLOTS_FILE_MASK, old_db->db_oid);
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		parallel_exec_prog(log_file_name,
+						   NULL,
+						   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+						   "--no-psqlrc --dbname %s -f \"%s/%s\"",
+						   new_cluster.bindir,
+						   cluster_conn_opts(&new_cluster),
+						   old_db->db_name,
+						   log_opts.dumpdir,
+						   slots_file_name);
+	}
+
+	/* reap all children */
+	while (reap_child(true) == true)
+		;
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..209061b4e0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -29,6 +29,7 @@
 /* contains both global db information and CREATE DATABASE commands */
 #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
+#define DB_DUMP_LOGICAL_SLOTS_FILE_MASK	"pg_upgrade_dump_%u_logical_slots.sql"
 
 /*
  * Base directories that include all the files generated internally, from the
@@ -46,6 +47,7 @@
 #define INTERNAL_LOG_FILE	"pg_upgrade_internal.log"
 
 extern char *output_files[];
+extern int	num_slots_on_old_cluster;
 
 /*
  * WIN32 files do not accept writes from multiple processes
@@ -150,6 +152,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -304,6 +323,8 @@ typedef struct
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
+	bool		exclude_logical_slots;	/* true -> do not dump and restore
+										 * logical replication slots */
 } UserOpts;
 
 typedef struct
@@ -400,6 +421,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..f015b5d363
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,152 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Create a slot on old node
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot2|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 66823bc2a7..5a34e351ef 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1500,7 +1500,10 @@ LogicalRepStreamAbortData
 LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v19-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v19-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 54eb780374816cf97a82a7fbbb7e2a6c165c99e5 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v19 3/3] pg_upgrade: Add check function for logical
 replication slots

pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

For implementing that some functions are ported from pg_walinspect to core.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -----------
 src/backend/access/transam/xlogutils.c        |  92 +++++++++++
 src/backend/catalog/system_functions.sql      |   4 +
 src/backend/utils/adt/pg_upgrade_support.c    |  62 ++++++-
 src/bin/pg_upgrade/check.c                    |  83 ++++++++++
 .../t/003_logical_replication_slots.pl        | 155 ++++++++++++------
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   8 +
 src/test/regress/sql/misc_functions.sql       |   2 +-
 9 files changed, 354 insertions(+), 149 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index e174a2a891..574797fbbd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 07c0d89c4f..7d52750ae9 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -616,6 +616,8 @@ REVOKE EXECUTE ON FUNCTION pg_backup_stop(boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
 
+REVOKE EXECUTE ON FUNCTION validate_wal_record_types_after(pg_lsn) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_switch_wal() FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_log_standby_snapshot() FROM public;
@@ -726,6 +728,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
 -- We also set up some things as accessible to standard roles.
 --
 
+GRANT EXECUTE ON FUNCTION validate_wal_record_types_after(pg_lsn) TO pg_read_server_files;
+
 GRANT EXECUTE ON FUNCTION pg_ls_logdir() TO pg_monitor;
 
 GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO pg_monitor;
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..c3a9c4a83b 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,15 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
+#include "catalog/pg_control.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "storage/standbydefs.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
-
+#include "utils/pg_lsn.h"
 
 #define CHECK_IS_BINARY_UPGRADE									\
 do {															\
@@ -261,3 +266,58 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+validate_wal_record_types_after(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogRecPtr	curr_lsn = GetFlushRecPtr(NULL);
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn > curr_lsn)
+		PG_RETURN_BOOL(true);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Read records till end of WAL */
+	while (ReadNextXLogRecord(xlogreader))
+	{
+		uint8		info;
+
+		/*
+		 * XXX: check the type of WAL. Currently XLOG info is directly
+		 * extracted, but it may be better to use the descriptor instead.
+		 */
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (info != XLOG_CHECKPOINT_SHUTDOWN)
+				PG_RETURN_BOOL(false);
+
+			initial_record = false;
+		}
+
+		else
+		{
+			/*
+			 * XXX: There is a possibility that following records may be
+			 * generated during the upgrade.
+			 */
+			if (info != XLOG_RUNNING_XACTS &&
+				info != XLOG_CHECKPOINT_ONLINE &&
+				info != XLOG_FPI_FOR_HINT)
+				PG_RETURN_BOOL(false);
+		}
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 93da9e15f3..e772629c42 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 int	num_slots_on_old_cluster;
 
@@ -111,6 +112,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (!user_opts.exclude_logical_slots)
+		check_for_confirmed_flush_lsn(&old_cluster);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1479,3 +1482,83 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	Assert(!user_opts.exclude_logical_slots);
+
+	/* logical slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/* Check that all logical slots are not in 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE temporary = false AND wal_status = 'lost';");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is obsolete.",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+
+	if (is_error)
+		pg_fatal("logical replication slots not to be in 'lost' state."
+				 "Please use --exclude-logical-replication-slots if it is "
+				 "unexpected.");
+
+	/*
+	 * Check that all logical replication slots have reached the current WAL
+	 * position.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE (SELECT pg_catalog.validate_wal_record_types_after(confirmed_flush_lsn)) IS FALSE "
+							"AND temporary = false;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (is_error)
+		pg_fatal("All logical replication slots consumed all the WALs."
+				 "Please use --exclude-logical-replication-slots if it is "
+				 "unexpected.");
+
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index f015b5d363..59e2c2f209 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,138 +15,187 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Create a slot on old node
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
 );
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+# Cause a failure at the start of pg_upgrade because slot do not finish
+# consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_drop_replication_slot('test_slot1');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot2');
+]);
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
 ]);
-$old_node->stop;
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot2|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..aaa474476c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6489,6 +6489,14 @@
   proargnames => '{rm_id, rm_name, rm_builtin}',
   prosrc => 'pg_get_wal_resource_managers' },
 
+{ oid => '8046', descr => 'Info of the WAL conent',
+  proname => 'validate_wal_record_types_after', prorows => '10', proretset => 't',
+  provolatile => 's', prorettype => 'bool', proargtypes => 'pg_lsn',
+  proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}',
+  proargnames => '{start_lsn,is_ok}',
+  prosrc => 'validate_wal_record_types_after' },
+
 { oid => '2621', descr => 'reload configuration files',
   proname => 'pg_reload_conf', provolatile => 'v', prorettype => 'bool',
   proargtypes => '', prosrc => 'pg_reload_conf' },
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b57f01f3e9..ffe7d5b4ce 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
-  FROM pg_split_walfile_name('ffffffFF00000001000000af');
+  FROM pg_split_walfile_name('ffffffFF00000001000000af');
\ No newline at end of file
-- 
2.27.0

#98

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: Masahiko Sawada (#96)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 10:37:04PM +0900, Masahiko Sawada wrote:

On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Are you suggesting doing this before we start the old cluster or after
we stop the old cluster? I was thinking about the pros and cons of
doing this check when the server is 'on' (along with other upgrade
checks something like the patch is doing now) versus when the server
is 'off'. I think the advantage of doing it when the server is 'off'
(after check_and_dump_old_cluster()) is that it will be ensured that
there is no extra WAL that could be generated during the upgrade and
has not been verified against confirmed_flush_lsn location. But OTOH,
to retrieve slot information when the server is 'off', we need a
separate utility or probably a functionality for the same in
pg_upgrade and also some WAL reading stuff which sounds to me like a
larger change that may not be warranted here. I think anyway the extra
WAL (if any got generated during the upgrade) won't be required after
the upgrade so not convinced to make such a check while the server is
'off'. Are there reasons which make it better to do this while the old
cluster is 'off'?

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that.

Agreed.

BTW this check would not be able to support live-check but I think
it's not a problem as this check with a running server will never be
able to pass.

Agreed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#99

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#95)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

On Thu, Aug 10, 2023 at 04:30:40PM +0900, Masahiko Sawada wrote:

On Thu, Aug 10, 2023 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sawada-San, Julien, and others, do you have any thoughts on the above point?

IIUC during the old cluster running in the middle of pg_upgrade it
doesn't accept TCP connections. I'm not sure we need to worry about
the case where someone in the same server attempts to create
replication slots during the upgrade.

AFAICS this is only true for non-Windows platform, so we would still need some
extra safeguards on Windows. Having those on all platforms will probably be
simpler and won't hurt otherwise.

The same is true for other objects, as Amit mentioned.

I disagree. As I mentioned before any module registered in
shared_preload_libraries can spawn background workers which can perform any
activity. There were previous reports of corruption because of multi-xact
being generated by such bgworkers during pg_upgrade, I'm pretty sure that there
are some modules that create objects (automatic partitioning tools for
instance). It's also unclear to me what would happen if some writes are
performed by such module at various points of the pg_upgrade process. Couldn't
that lead to either data loss or broken slot (as it couldn't stream changes
from older major version)?

#100

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#96)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Are you suggesting doing this before we start the old cluster or after
we stop the old cluster? I was thinking about the pros and cons of
doing this check when the server is 'on' (along with other upgrade
checks something like the patch is doing now) versus when the server
is 'off'. I think the advantage of doing it when the server is 'off'
(after check_and_dump_old_cluster()) is that it will be ensured that
there is no extra WAL that could be generated during the upgrade and
has not been verified against confirmed_flush_lsn location. But OTOH,
to retrieve slot information when the server is 'off', we need a
separate utility or probably a functionality for the same in
pg_upgrade and also some WAL reading stuff which sounds to me like a
larger change that may not be warranted here. I think anyway the extra
WAL (if any got generated during the upgrade) won't be required after
the upgrade so not convinced to make such a check while the server is
'off'. Are there reasons which make it better to do this while the old
cluster is 'off'?

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though.

Won't that require a lot of version-specific checks as across versions
the file format could be different? For the case of the control file,
we use version-specific pg_controldata (for the old cluster, the
corresponding version's pg_controldata) utility to read the old
version control file. I thought we need something similar here if we
want to do what you are suggesting.

After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?).

But isn't the advantage of doing via pg_dump (in binary_mode) that we
allow some outside core in-place upgrade tool to also use it if
required? If we don't think that would be required then we can
probably use the info we retrieve it in pg_upgrade.

the first commit. Or another idea would be to allow users to mark
replication slots "upgradable" so that pg_upgrade skips the
confirmed_flush_lsn check.

I guess for that we need to ask users to ensure that confirm_flush_lsn
is up-to-date and then provide some slot-level API to mark the slots
with the required status. If so, that sounds a bit complicated for
users.

--
With Regards,
Amit Kapila.

#101

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Julien Rouhaud (#99)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 11, 2023 at 10:43 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Thu, Aug 10, 2023 at 04:30:40PM +0900, Masahiko Sawada wrote:

On Thu, Aug 10, 2023 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sawada-San, Julien, and others, do you have any thoughts on the above point?

IIUC during the old cluster running in the middle of pg_upgrade it
doesn't accept TCP connections. I'm not sure we need to worry about
the case where someone in the same server attempts to create
replication slots during the upgrade.

AFAICS this is only true for non-Windows platform, so we would still need some
extra safeguards on Windows. Having those on all platforms will probably be
simpler and won't hurt otherwise.

The same is true for other objects, as Amit mentioned.

I disagree. As I mentioned before any module registered in
shared_preload_libraries can spawn background workers which can perform any
activity. There were previous reports of corruption because of multi-xact
being generated by such bgworkers during pg_upgrade, I'm pretty sure that there
are some modules that create objects (automatic partitioning tools for
instance). It's also unclear to me what would happen if some writes are
performed by such module at various points of the pg_upgrade process. Couldn't
that lead to either data loss or broken slot (as it couldn't stream changes
from older major version)?

It won't be any bad than what can happen to tables. If we know that
such bgworkers can cause corruption if they do writes during the
upgrade, I don't think it is the job of this patch to prevent the
related scenarios. We can probably disallow the creation of new slots
during the binary upgrade but that also I am not sure. I guess it
would be better to document such hazards as a first step and then
probably write a patch to prevent WAL writes or something along those
lines.

--
With Regards,
Amit Kapila.

#102

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: Amit Kapila (#101)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 11, 2023 at 11:18:09AM +0530, Amit Kapila wrote:

On Fri, Aug 11, 2023 at 10:43 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

I disagree. As I mentioned before any module registered in
shared_preload_libraries can spawn background workers which can perform any
activity. There were previous reports of corruption because of multi-xact
being generated by such bgworkers during pg_upgrade, I'm pretty sure that there
are some modules that create objects (automatic partitioning tools for
instance). It's also unclear to me what would happen if some writes are
performed by such module at various points of the pg_upgrade process. Couldn't
that lead to either data loss or broken slot (as it couldn't stream changes
from older major version)?

It won't be any bad than what can happen to tables. If we know that
such bgworkers can cause corruption if they do writes during the
upgrade, I don't think it is the job of this patch to prevent the
related scenarios. We can probably disallow the creation of new slots
during the binary upgrade but that also I am not sure. I guess it
would be better to document such hazards as a first step and then
probably write a patch to prevent WAL writes or something along those
lines.

Yes, if users are connecting to the clusters during pg_upgrade, we have
many more problems than slots.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#103

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: Amit Kapila (#100)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 11, 2023 at 10:46:31AM +0530, Amit Kapila wrote:

On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though.

Won't that require a lot of version-specific checks as across versions
the file format could be different? For the case of the control file,
we use version-specific pg_controldata (for the old cluster, the
corresponding version's pg_controldata) utility to read the old
version control file. I thought we need something similar here if we
want to do what you are suggesting.

You mean the slot file format? We will need that complexity somewhere,
so why not in pg_upgrade?

After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?).

But isn't the advantage of doing via pg_dump (in binary_mode) that we
allow some outside core in-place upgrade tool to also use it if
required? If we don't think that would be required then we can
probably use the info we retrieve it in pg_upgrade.

You mean the code reading the slot file? I don't see the point of
adding user complexity to enable some hypothetical external usage.

the first commit. Or another idea would be to allow users to mark
replication slots "upgradable" so that pg_upgrade skips the
confirmed_flush_lsn check.

I guess for that we need to ask users to ensure that confirm_flush_lsn
is up-to-date and then provide some slot-level API to mark the slots
with the required status. If so, that sounds a bit complicated for
users.

Agreed, not worth it.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#104

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#103)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 11, 2023 at 11:38 PM Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Aug 11, 2023 at 10:46:31AM +0530, Amit Kapila wrote:

On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though.

Won't that require a lot of version-specific checks as across versions
the file format could be different? For the case of the control file,
we use version-specific pg_controldata (for the old cluster, the
corresponding version's pg_controldata) utility to read the old
version control file. I thought we need something similar here if we
want to do what you are suggesting.

You mean the slot file format?

Yes.

We will need that complexity somewhere,
so why not in pg_upgrade?

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]/messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com. Do
you have something better in mind? If so, can you please explain the
same a bit more?

After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?).

But isn't the advantage of doing via pg_dump (in binary_mode) that we
allow some outside core in-place upgrade tool to also use it if
required? If we don't think that would be required then we can
probably use the info we retrieve it in pg_upgrade.

You mean the code reading the slot file? I don't see the point of
adding user complexity to enable some hypothetical external usage.

It is not just that we need a slot reading facility but rather mimic
something like pg_get_replication_slots() where we have to know the
walstate (WALAVAIL_REMOVED, etc.) as well. I am not against it but am
not sure that we do it for any other object in the upgrade. Can you
please point me out if we have any such prior usage? Even if we don't
do it today, we can start doing now if that makes sense but it appears
to me that we are accessing contents of data-dir/WAL by invoking some
other utilities like pg_controldata, pg_resetwal, so something similar
would make sense here. Actually, what we do here also somewhat depends
on what we decide for the other point we are discussing above in the
email.

[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com

--
With Regards,
Amit Kapila.

#105

Bruce Momjian

bruce@momjian.us

over 2 years ago

In reply to: Amit Kapila (#104)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Aug 12, 2023 at 11:50:36AM +0530, Amit Kapila wrote:

We will need that complexity somewhere,
so why not in pg_upgrade?

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]. Do
you have something better in mind? If so, can you please explain the
same a bit more?

Yes, if you want to break it out into a separate tool and then have
pg_upgrade call/parse it like it calls/parses pg_controldata, that seems
fine.

After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?).

But isn't the advantage of doing via pg_dump (in binary_mode) that we
allow some outside core in-place upgrade tool to also use it if
required? If we don't think that would be required then we can
probably use the info we retrieve it in pg_upgrade.

You mean the code reading the slot file? I don't see the point of
adding user complexity to enable some hypothetical external usage.

It is not just that we need a slot reading facility but rather mimic
something like pg_get_replication_slots() where we have to know the
walstate (WALAVAIL_REMOVED, etc.) as well. I am not against it but am
not sure that we do it for any other object in the upgrade. Can you
please point me out if we have any such prior usage? Even if we don't
do it today, we can start doing now if that makes sense but it appears
to me that we are accessing contents of data-dir/WAL by invoking some
other utilities like pg_controldata, pg_resetwal, so something similar
would make sense here. Actually, what we do here also somewhat depends
on what we decide for the other point we are discussing above in the
email.

Yes, if there is value in having that information available via the
command-line tool, it makes sense to add it.

Let me add that developers have complained how pg_upgrade scrapes the
output pg_controldata rather than reading the file, and we are basically
do that some more with this. However, I think that is an appropriate
approach.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#106

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#104)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 11, 2023 at 11:38 PM Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Aug 11, 2023 at 10:46:31AM +0530, Amit Kapila wrote:

On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though.

Won't that require a lot of version-specific checks as across versions
the file format could be different? For the case of the control file,
we use version-specific pg_controldata (for the old cluster, the
corresponding version's pg_controldata) utility to read the old
version control file. I thought we need something similar here if we
want to do what you are suggesting.

You mean the slot file format?

Yes.

We will need that complexity somewhere,
so why not in pg_upgrade?

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]. Do
you have something better in mind? If so, can you please explain the
same a bit more?

Yeah, we need something like pg_replslotdata. If there are other useful
usecases for this tool, it would be good to have it. But I'm not sure other
than pg_upgrade usecase.

Another idea is (which might have already discussed thoguh) that we check
if the latest shutdown checkpoint LSN in the control file matches the
confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure
that the slot has consumed all WAL records before the last shutdown. We
don't need to worry about WAL records generated after starting the old
cluster during the upgrade, at least for logical replication slots.

Regards,

#107

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#106)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]. Do
you have something better in mind? If so, can you please explain the
same a bit more?

Yeah, we need something like pg_replslotdata. If there are other useful usecases for this tool, it would be good to have it. But I'm not sure other than pg_upgrade usecase.

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN. I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve it
via pg_dump doesn't seem useful unless I am missing something. Does
this match your understanding?

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 170000)
+ return;
...
}

If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.

Thoughts?

--
With Regards,
Amit Kapila.

#108

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#97)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 10, 2023 at 8:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Based on recent discussions, I updated the patch set. I did not reply one by one
because there are many posts, but thank you for giving many suggestion!

Followings shows what I changed.

1.
This feature is now enabled by default. Instead "--exclude-logical-replication-slots"
was added. (Per suggestions like [1])

AFAICS, we don't have any concrete agreement on such an option but my
vote is to not have such an option as we don't have any similar option
for any other object. I understand that it could be convenient for
some use cases where some of the logical slots are not yet caught up
w.r.t WAL and users want to upgrade without the slots but not sure if
that is really the case. Does anyone else have an opinion on this
point?

--
With Regards,
Amit Kapila.

#109

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#107)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]. Do
you have something better in mind? If so, can you please explain the
same a bit more?

Yeah, we need something like pg_replslotdata. If there are other useful usecases for this tool, it would be good to have it. But I'm not sure other than pg_upgrade usecase.

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve it
via pg_dump doesn't seem useful unless I am missing something. Does
this match your understanding?

If there are use cases for --logical-replication-slots-only option
other than pg_upgrade, it would be good to have it in pg_dump. I was
just not sure of other use cases.

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 170000)
+ return;
...
}
If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.

I think that without 0001 patch the replication slots will not be able
to pass the confirmed_flush_lsn check.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#110

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#109)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

This feature only targets logical replication slots. I don't see a
reason to be different for manually created logical replication slots.
Is there something particular that you think we could be missing?

I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

+1

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve it
via pg_dump doesn't seem useful unless I am missing something. Does
this match your understanding?

If there are use cases for --logical-replication-slots-only option
other than pg_upgrade, it would be good to have it in pg_dump. I was
just not sure of other use cases.

It was primarily for upgrade purposes only. So, as we can't see a good
reason to go via pg_dump let's do it in upgrade unless someone thinks
otherwise.

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 170000)
+ return;
...
}
If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.
I think that without 0001 patch the replication slots will not be able
to pass the confirmed_flush_lsn check.

Right, but we can think of backpatching the same. Anyway, we can do
that as a separate work by starting a new thread to see if there is a
broader agreement for backpatching such a change. For now, we can
focus on >=v17.

--
With Regards,
Amit Kapila.

#111

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#110)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Tuesday, August 15, 2023 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Another idea is (which might have already discussed thoguh) that we

check if the latest shutdown checkpoint LSN in the control file matches the
confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that
the slot has consumed all WAL records before the last shutdown. We don't
need to worry about WAL records generated after starting the old cluster
during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even
updated during the upgrade. So, instead of reading from WAL, we need
to change so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

+1

Changed the check to compare the latest checkpoint lsn from pg_controldata
with the confirmed_flush_lsn in pg_replication_slots view.

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve
it via pg_dump doesn't seem useful unless I am missing something.
Does this match your understanding?

If there are use cases for --logical-replication-slots-only option
other than pg_upgrade, it would be good to have it in pg_dump. I was
just not sure of other use cases.

It was primarily for upgrade purposes only. So, as we can't see a good reason to
go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */ if (fout->remoteVersion
+ < 170000) return;
...
}
If we decide to use the existing view pg_replication_slots then can
we consider upgrading slots from the prior version to 17? Now, if we
want to invent any new API similar to pg_replslotdata then we can't
do this because it won't exist in prior versions but OTOH using
existing view pg_replication_slots can allow us to fetch slot info
from older versions as well. So, I think it is worth considering.
I think that without 0001 patch the replication slots will not be able
to pass the confirmed_flush_lsn check.
Right, but we can think of backpatching the same. Anyway, we can do that as a
separate work by starting a new thread to see if there is a broader agreement
for backpatching such a change. For now, we can focus on >=v17.

Here is the new version patch which addressed above points.
The new version patch also removes the --exclude-logical-replication-slots
option due to recent comment.
Thanks Kuroda-san for addressing most of the points.

Best Regards,
Hou zj

Attachments:

v20-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v20-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 285953f835928a8bb74e4260527541aa5dd8324a Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v20 3/3] pg_upgrade: Add check function for logical
 replication slots

pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    |  82 +++++++++
 src/bin/pg_upgrade/controldata.c              |  31 ++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        | 155 ++++++++++++------
 src/test/regress/sql/misc_functions.sql       |   2 +-
 5 files changed, 219 insertions(+), 54 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index d5af0dcbc7..2ad1249cf8 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster, bool live_check);
 
 int	num_slots_on_old_cluster;
 
@@ -108,6 +109,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (num_slots_on_old_cluster)
+		check_for_confirmed_flush_lsn(&old_cluster, live_check);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1471,3 +1474,82 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster, bool live_check)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/* Check that all logical slots are not in 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE temporary = false AND wal_status = 'lost';");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is obsolete.",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+
+	if (is_error)
+		pg_fatal("logical replication slots not to be in 'lost' state.");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot be
+	 * done in case of live_check because the server has not been written the
+	 * SHUTDOWN_CHECKPOINT record yet.
+	 */
+	if (!live_check)
+	{
+		res = executeQueryOrDie(conn,
+								"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+								"WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
+								old_cluster.controldata.chkpnt_latest_upper,
+								old_cluster.controldata.chkpnt_latest_lower);
+
+		ntups = PQntuples(res);
+		i_slotname = PQfnumber(res, "slot_name");
+
+		for (i = 0; i < ntups; i++)
+		{
+			is_error = true;
+
+			pg_log(PG_WARNING,
+				   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				   PQgetvalue(res, i, i_slotname));
+		}
+
+		PQclear(res);
+		PQfinish(conn);
+
+		if (is_error)
+			pg_fatal("All logical replication slots consumed all the WALs.");
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..e00543e74a 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,37 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Gather latest checkpoint location if the cluster is newer or
+				 * equal to 17. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 17)
+				{
+					char *slash = NULL;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					/*
+					 * Upper and lower part of LSN must be read and stored
+					 * separately because it is reported as %X/%X format.
+					 */
+					cluster->controldata.chkpnt_latest_upper =
+						strtoul(p, &slash, 16);
+					cluster->controldata.chkpnt_latest_lower =
+						strtoul(++slash, NULL, 16);
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2a3a178cde..475bbeb30b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -245,6 +245,9 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+
+	uint32		chkpnt_latest_upper;
+	uint32		chkpnt_latest_lower;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index f015b5d363..59e2c2f209 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,138 +15,187 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Create a slot on old node
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
 );
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+# Cause a failure at the start of pg_upgrade because slot do not finish
+# consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_drop_replication_slot('test_slot1');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot2');
+]);
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
 ]);
-$old_node->stop;
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot2|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b57f01f3e9..ffe7d5b4ce 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
-  FROM pg_split_walfile_name('ffffffFF00000001000000af');
+  FROM pg_split_walfile_name('ffffffFF00000001000000af');
\ No newline at end of file
-- 
2.30.0.windows.2

v20-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v20-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From f98031a1a4619d73ae96701b0b22893166337b3c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v20 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..5aed7cd190 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.30.0.windows.2

v20-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v20-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From f9dec821dd5c09ad2486438877aff4e8260f2e73 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v20 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded.

For pg_upgrade, it query the logical replication slots information from the old
cluter and restores the slots using the pg_create_logical_replication_slots()
statements. Note that we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be
restored before executing the pg_resetwal command because it will remove WALs
that are required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  30 ++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  69 ++++++++
 src/bin/pg_upgrade/dump.c                     |   1 +
 src/bin/pg_upgrade/info.c                     | 117 +++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  68 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 152 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 464 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..af776c4ceb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -402,6 +402,36 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> try to dump and restore logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
+
+    <para>
+     Upgrading slots has some settings. At first, all the slots must not be in
+     <literal>lost</literal>, and they must have consumed all the WALs on old
+     node. Furthermore, new node must have larger
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     than existing slots on old node, and
+     <link linkend="guc-wal-level"><varname>wal_level</varname></link> must be
+     <literal>logical</literal>. <application>pg_upgrade</application> will
+     run error if something wrong.
+    </para>
+   </step>
+
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..d5af0dcbc7 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+int	num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Checking for logical slots must be done before
+	 * check_new_cluster_is_empty() because the slot_arr attribute of the
+	 * new_cluster will be checked in that function.
+	 */
+	if (num_slots_on_old_cluster)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -353,6 +369,8 @@ check_new_cluster_is_empty(void)
 	{
 		int			relnum;
 		RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
+		DbInfo     *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
 
 		for (relnum = 0; relnum < rel_arr->nrels;
 			 relnum++)
@@ -364,6 +382,14 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * Check the existence of logical replication slots.
+		 */
+		if (slot_arr->nslots)
+			pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+					 pDbInfo->db_name,
+					 slot_arr->slots[0].slotname);
 	}
 }
 
@@ -1402,3 +1428,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* logical replication slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..a46562639b 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -36,6 +36,7 @@ generate_old_dump(void)
 	{
 		char		sql_file_name[MAXPGPATH],
 					log_file_name[MAXPGPATH];
+
 		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
 		PQExpBufferData connstr,
 					escaped_connstr;
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..a39c739a9a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,102 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+	const char *std_strings;
+
+	dbinfo->slot_arr.encoding = PQclientEncoding(conn);
+
+	std_strings = PQparameterStatus(conn, "standard_conforming_strings");
+	dbinfo->slot_arr.std_strings = (std_strings && strcmp(std_strings, "on") == 0);
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status IN ('reserved', 'extended');");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +707,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +763,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..ec8c3ff42c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (num_slots_on_old_cluster)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,57 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+	int			slotnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PQExpBuffer query = createPQExpBuffer();
+		PGconn	   *conn = connectToServer(&new_cluster, old_db->db_name);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			/*
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
+								slot_arr->encoding, slot_arr->std_strings);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
+								slot_arr->encoding, slot_arr->std_strings);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..2a3a178cde 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -46,6 +46,7 @@
 #define INTERNAL_LOG_FILE	"pg_upgrade_internal.log"
 
 extern char *output_files[];
+extern int	num_slots_on_old_cluster;
 
 /*
  * WIN32 files do not accept writes from multiple processes
@@ -150,6 +151,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+	int			encoding;
+	bool		std_strings;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..f015b5d363
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,152 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Create a slot on old node
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot2|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 51b7951ad8..0071efef1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.30.0.windows.2

#112

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#111)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thanks for posting the patch! I want to open a question to gather opinions from others.

It was primarily for upgrade purposes only. So, as we can't see a good reason to
go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas old ones were
parallelised per db. Do you it should be parallelized again? I have tested locally
and felt harmless. Also, this approch allows to log the executed SQLs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#113

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#112)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

It was primarily for upgrade purposes only. So, as we can't see a good reason

to

go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas old ones were
parallelised per db. Do you it should be parallelized again? I have tested locally
and felt harmless. Also, this approch allows to log the executed SQLs.

I updated the patch to allow parallel executions. Workers are launched per slots,
each one connects to the new node via psql and executes pg_create_logical_replication_slot().
Moreover, following points were changed for 0002.

* Ensured to log executed SQLs for creating slots.
* Fixed an issue that 'unreserved' slots could not be upgrade. This change was
not expected one. Related discussion was [1]/messages/by-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA@TYAPR01MB5866.jpnprd01.prod.outlook.com.
* Added checks for output plugin libraries. pg_upgrade ensures that plugins
referred by old slots were installed to the new executable directory.

[1]: /messages/by-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v21-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v21-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From bddf5645240cd4f00cbef36186af59dfa50d333c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v21 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..5aed7cd190 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v21-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v21-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From c42c1cfcd6773084043a6250ce06a806c0b7b841 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v21 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded.

For pg_upgrade, it query the logical replication slots information from the old
cluter and restores the slots using the pg_create_logical_replication_slots()
statements. Note that we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be
restored before executing the pg_resetwal command because it will remove WALs
that are required by the slots.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  30 ++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  69 ++++++++
 src/bin/pg_upgrade/dump.c                     |   1 +
 src/bin/pg_upgrade/function.c                 |  14 +-
 src/bin/pg_upgrade/info.c                     | 117 +++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 104 ++++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +++
 .../t/003_logical_replication_slots.pl        | 152 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 11 files changed, 511 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..af776c4ceb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -402,6 +402,36 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> try to dump and restore logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
+
+    <para>
+     Upgrading slots has some settings. At first, all the slots must not be in
+     <literal>lost</literal>, and they must have consumed all the WALs on old
+     node. Furthermore, new node must have larger
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     than existing slots on old node, and
+     <link linkend="guc-wal-level"><varname>wal_level</varname></link> must be
+     <literal>logical</literal>. <application>pg_upgrade</application> will
+     run error if something wrong.
+    </para>
+   </step>
+
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..d5af0dcbc7 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,7 +30,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
+int	num_slots_on_old_cluster;
 
 /*
  * fix_path_separator
@@ -89,6 +91,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	num_slots_on_old_cluster = get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +194,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Checking for logical slots must be done before
+	 * check_new_cluster_is_empty() because the slot_arr attribute of the
+	 * new_cluster will be checked in that function.
+	 */
+	if (num_slots_on_old_cluster)
+	{
+		(void) get_logical_slot_infos(&new_cluster);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -353,6 +369,8 @@ check_new_cluster_is_empty(void)
 	{
 		int			relnum;
 		RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
+		DbInfo     *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
 
 		for (relnum = 0; relnum < rel_arr->nrels;
 			 relnum++)
@@ -364,6 +382,14 @@ check_new_cluster_is_empty(void)
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * Check the existence of logical replication slots.
+		 */
+		if (slot_arr->nslots)
+			pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+					 pDbInfo->db_name,
+					 slot_arr->slots[0].slotname);
 	}
 }
 
@@ -1402,3 +1428,46 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* logical replication slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (max_replication_slots == 0)
+		pg_fatal("max_replication_slots must be greater than 0");
+	else if (num_slots_on_old_cluster > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..a46562639b 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -36,6 +36,7 @@ generate_old_dump(void)
 	{
 		char		sql_file_name[MAXPGPATH],
 					log_file_name[MAXPGPATH];
+
 		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
 		PQExpBufferData connstr,
 					escaped_connstr;
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c929b92ff6 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,8 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing C-language functions, and
+ *	output plugins used by existing logical replication slots.
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +67,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions and
+		 * output plugins in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a9988abfe1..aea3e5fbf3 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,102 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	char		query[QUERY_ALLOC];
+	const char *std_strings;
+
+	dbinfo->slot_arr.encoding = PQclientEncoding(conn);
+
+	std_strings = PQparameterStatus(conn, "standard_conforming_strings");
+	dbinfo->slot_arr.std_strings = (std_strings && strcmp(std_strings, "on") == 0);
+
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status <> 'lost';");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slotnum += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +707,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +763,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..688b84d62e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (num_slots_on_old_cluster)
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,93 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+	int			slotnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PQExpBuffer query,
+					escaped;
+		PGconn	   *conn;
+		char		log_file_name[MAXPGPATH];
+
+		/* Quick exit if there are no slots */
+		if (!slot_arr->nslots)
+			continue;
+
+		query = createPQExpBuffer();
+		escaped = createPQExpBuffer();
+		conn = connectToServer(&new_cluster, old_db->db_name);
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			/*
+			 * Constructs query for creating logical replication slots.
+			 *
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
+								slot_arr->encoding, slot_arr->std_strings);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
+								slot_arr->encoding, slot_arr->std_strings);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+			/*
+			 * The string must be escaped to shell-style, because there is a
+			 * possibility that output plugin name contains quotes. The output
+			 * string would be sandwiched by the single quotes, so it does not have
+			 * to be wrapped by any quotes when it is passed to
+			 * parallel_exec_prog().
+			 */
+			appendShellString(escaped, query->data);
+
+			parallel_exec_prog(log_file_name,
+							   NULL,
+							   "\"%s/psql\" %s --echo-queries --set ON_ERROR_STOP=on "
+							   "--no-psqlrc --dbname %s -c %s",
+							   new_cluster.bindir,
+							   cluster_conn_opts(&new_cluster),
+							   old_db->db_name,
+							   escaped->data);
+			resetPQExpBuffer(escaped);
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(escaped);
+		destroyPQExpBuffer(query);
+
+		/* reap all children */
+		while (reap_child(true) == true)
+			;
+	}
+
+	end_progress_output();
+	check_ok();
+
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..2a3a178cde 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -46,6 +46,7 @@
 #define INTERNAL_LOG_FILE	"pg_upgrade_internal.log"
 
 extern char *output_files[];
+extern int	num_slots_on_old_cluster;
 
 /*
  * WIN32 files do not accept writes from multiple processes
@@ -150,6 +151,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* Can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;
+	LogicalSlotInfo *slots;
+	int			encoding;
+	bool		std_strings;
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			get_logical_slot_infos(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..f015b5d363
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,152 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_node = PostgreSQL::Test::Cluster->new('old_node');
+$old_node->init(allows_streaming => 'logical');
+$old_node->start;
+
+# Initialize new node
+my $new_node = PostgreSQL::Test::Cluster->new('new_node');
+$new_node->init(allows_streaming => 1);
+
+my $bindir = $new_node->config_data('--bindir');
+
+$old_node->stop;
+
+# Create a slot on old node
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. The case max_replication_slots is set
+# to 0 is prohibited.
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is 0
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# non-zero value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Create a slot on old node, and generate WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+]);
+
+$old_node->stop;
+
+# Cause a failure at the start of pg_upgrade because max_replication_slots is
+# smaller than existing slots on old node
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with small max_replication_slots');
+ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# appropriate value
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+
+# Remove an unnecessary slot and consume WALs
+$old_node->start;
+$old_node->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('test_slot1');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+]);
+$old_node->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_node->data_dir,
+		'-D',         $new_node->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_node->host,
+		'-p',         $old_node->port,
+		'-P',         $new_node->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_node->start;
+my $result = $new_node->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot2|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 51b7951ad8..0071efef1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v21-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v21-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 3b2241f9379fc8cb72b6f4f14acf9bda1e26dfa0 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 14 Apr 2023 08:59:03 +0000
Subject: [PATCH v21 3/3] pg_upgrade: Add check function for logical
 replication slots

pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C
---
 src/bin/pg_upgrade/check.c                    |  82 +++++++++
 src/bin/pg_upgrade/controldata.c              |  31 ++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        | 155 ++++++++++++------
 src/test/regress/sql/misc_functions.sql       |   2 +-
 5 files changed, 219 insertions(+), 54 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index d5af0dcbc7..2ad1249cf8 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster, bool live_check);
 
 int	num_slots_on_old_cluster;
 
@@ -108,6 +109,8 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_composite_data_type_usage(&old_cluster);
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
+	if (num_slots_on_old_cluster)
+		check_for_confirmed_flush_lsn(&old_cluster, live_check);
 
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
@@ -1471,3 +1474,82 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster, bool live_check)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	bool		is_error = false;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be dumped since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/* Check that all logical slots are not in 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE temporary = false AND wal_status = 'lost';");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		is_error = true;
+
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is obsolete.",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+
+	if (is_error)
+		pg_fatal("logical replication slots not to be in 'lost' state.");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot be
+	 * done in case of live_check because the server has not been written the
+	 * SHUTDOWN_CHECKPOINT record yet.
+	 */
+	if (!live_check)
+	{
+		res = executeQueryOrDie(conn,
+								"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+								"WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
+								old_cluster.controldata.chkpnt_latest_upper,
+								old_cluster.controldata.chkpnt_latest_lower);
+
+		ntups = PQntuples(res);
+		i_slotname = PQfnumber(res, "slot_name");
+
+		for (i = 0; i < ntups; i++)
+		{
+			is_error = true;
+
+			pg_log(PG_WARNING,
+				   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				   PQgetvalue(res, i, i_slotname));
+		}
+
+		PQclear(res);
+		PQfinish(conn);
+
+		if (is_error)
+			pg_fatal("All logical replication slots consumed all the WALs.");
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..e00543e74a 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,37 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Gather latest checkpoint location if the cluster is newer or
+				 * equal to 17. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 17)
+				{
+					char *slash = NULL;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					/*
+					 * Upper and lower part of LSN must be read and stored
+					 * separately because it is reported as %X/%X format.
+					 */
+					cluster->controldata.chkpnt_latest_upper =
+						strtoul(p, &slash, 16);
+					cluster->controldata.chkpnt_latest_lower =
+						strtoul(++slash, NULL, 16);
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2a3a178cde..475bbeb30b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -245,6 +245,9 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+
+	uint32		chkpnt_latest_upper;
+	uint32		chkpnt_latest_lower;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index f015b5d363..59e2c2f209 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -15,138 +15,187 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old node
-my $old_node = PostgreSQL::Test::Cluster->new('old_node');
-$old_node->init(allows_streaming => 'logical');
-$old_node->start;
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
 
 # Initialize new node
-my $new_node = PostgreSQL::Test::Cluster->new('new_node');
-$new_node->init(allows_streaming => 1);
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 1);
 
-my $bindir = $new_node->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_node->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Create a slot on old node
-$old_node->start;
-$old_node->safe_psql(
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
 );
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because wal_level is replica
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong wal_level');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. The case max_replication_slots is set
 # to 0 is prohibited.
-$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 0");
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is 0
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with wrong max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # non-zero value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 1");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# Create a slot on old node, and generate WALs
-$old_node->start;
-$old_node->safe_psql(
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
 	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 
-$old_node->stop;
+$old_publisher->stop;
 
 # Cause a failure at the start of pg_upgrade because max_replication_slots is
 # smaller than existing slots on old node
 command_fails(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node with small max_replication_slots');
-ok( -d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Clean up
-rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # Preparations for the subsequent test. max_replication_slots is set to
 # appropriate value
-$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 10");
 
-# Remove an unnecessary slot and consume WALs
-$old_node->start;
-$old_node->safe_psql(
+# Cause a failure at the start of pg_upgrade because slot do not finish
+# consuming all the WALs
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+$old_publisher->start;
+$old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_drop_replication_slot('test_slot1');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
+	SELECT pg_drop_replication_slot('test_slot2');
+]);
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
 ]);
-$old_node->stop;
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
 
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',
-		'-d',         $old_node->data_dir,
-		'-D',         $new_node->data_dir,
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
 		'-b',         $bindir,
 		'-B',         $bindir,
-		'-s',         $new_node->host,
-		'-p',         $old_node->port,
-		'-P',         $new_node->port,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
 		$mode,
 	],
 	'run of pg_upgrade of old node');
-ok( !-d $new_node->data_dir . "/pg_upgrade_output.d",
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-$new_node->start;
-my $result = $new_node->safe_psql('postgres',
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot2|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index b57f01f3e9..ffe7d5b4ce 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
-  FROM pg_split_walfile_name('ffffffFF00000001000000af');
+  FROM pg_split_walfile_name('ffffffFF00000001000000af');
\ No newline at end of file
-- 
2.27.0

#114

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#113)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Wednesday, August 16, 2023 6:25 PM Kuroda, Hayato/黒田隼人 wrote:

Dear hackers,

It was primarily for upgrade purposes only. So, as we can't see a
good reason

to

go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas
old ones were parallelised per db. Do you it should be parallelized
again? I have tested locally and felt harmless. Also, this approch allows to log

the executed SQLs.

I updated the patch to allow parallel executions. Workers are launched per
slots, each one connects to the new node via psql and executes
pg_create_logical_replication_slot().
Moreover, following points were changed for 0002.

* Ensured to log executed SQLs for creating slots.
* Fixed an issue that 'unreserved' slots could not be upgrade. This change was
not expected one. Related discussion was [1].
* Added checks for output plugin libraries. pg_upgrade ensures that plugins
referred by old slots were installed to the new executable directory.

Thanks for updating the patch ! Here are few comments:

+static void
+create_logical_replication_slots(void)
...
+		query = createPQExpBuffer();
+		escaped = createPQExpBuffer();
+		conn = connectToServer(&new_cluster, old_db->db_name);

Since the connection here is not used anymore, so I think we can remove it.

+static void
+create_logical_replication_slots(void)
...
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}

Do we need to get new slots again after restoring ?

+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND temporary = false "
+			 "AND wal_status <> 'lost';");
+
+	res = executeQueryOrDie(conn, "%s", query);
+

Instead of building the query in a new variable, can we directly put the SQL in executeQueryOrDie()
e.g.
executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase ...");

4.
+int num_slots_on_old_cluster;

Instead of a new global variable, would it be better to record this in the cluster info ?

char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];

There is an extra change here.

6.
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
..
+		/* reap all children */
+		while (reap_child(true) == true)
+			;
+	}

Maybe we can move the "while (reap_child(true) == true)" out of the for() loop ?

Best Regards,
Hou zj

#115

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#113)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 16, 2023 at 3:55 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

It was primarily for upgrade purposes only. So, as we can't see a good reason

to

go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas old ones were
parallelised per db. Do you it should be parallelized again? I have tested locally
and felt harmless. Also, this approch allows to log the executed SQLs.

I updated the patch to allow parallel executions. Workers are launched per slots,
each one connects to the new node via psql and executes pg_create_logical_replication_slot().

Will it be beneficial for slots? Invoking a separate process each time
could be more costlier than slot creation. The other thing is during
slot creation, the snapbuild waits for parallel transactions to finish
so that can also hurt the patch. I think we can test it by having 50,
100, or 500 slots on the old cluster and see if doing parallel
execution for the creation of those on the new cluster has any benefit
over serial execution.

Moreover, following points were changed for 0002.

* Ensured to log executed SQLs for creating slots.
* Fixed an issue that 'unreserved' slots could not be upgrade. This change was
not expected one. Related discussion was [1].
* Added checks for output plugin libraries. pg_upgrade ensures that plugins
referred by old slots were installed to the new executable directory.

I think this is a good idea but did you test it with out-of-core
plugins, if so, can you please share the results? Also, let's update
this information in docs as well.

Few minor comments
1. Why the patch updates the slots info at the end of
create_logical_replication_slots()? Can you please update the comments
for the same?

2.
@@ -36,6 +36,7 @@ generate_old_dump(void)
{
char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];

Spurious line change.

--
With Regards,
Amit Kapila.

#116

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#114)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 16, 2023 at 4:51 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

4.
+int num_slots_on_old_cluster;

Instead of a new global variable, would it be better to record this in the cluster info ?

I was thinking whether we can go a step ahead and remove this variable
altogether. In old cluster handling, we can get and check together at
the same place and for the new cluster, if we have a function that
returns slot_count by traversing old clusterinfo that should be
sufficient. If you have other better ideas to eliminate this variable
that is also fine. I think this will make the patch bit clean w.r.t
this new variable.

--
With Regards,
Amit Kapila.

#117

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#113)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for the first 2 patches.

(There are a couple of overlaps with what Hou-san already wrote review
comments about)

For patch v21-0001...

======
1. SaveSlotToPath

- /* and don't do anything if there's nothing to write */
- if (!was_dirty)
+ /*
+ * and don't do anything if there's nothing to write, unless it's this is
+ * called for a logical slot during a shutdown checkpoint, as we want to
+ * persist the confirmed_flush_lsn in that case, even if that's the only
+ * modification.
+ */
+ if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
  return;

The condition seems to be coded in a slightly awkward way when
compared to how the comment was worded.

How about:
if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))

//////////

For patch v21-0002...

======
Commit Message

For pg_upgrade, it query the logical replication slots information from the old
cluter and restores the slots using the pg_create_logical_replication_slots()
statements. Note that we need to separate the timing of restoring replication
slots and other objects. Replication slots, in particular, should not be
restored before executing the pg_resetwal command because it will remove WALs
that are required by the slots.

Revisit this paragraph. There are lots of typos etc.

1a.
"For pg_upgrade". I think this wording is a hangover from back when
the patch was split into two parts for pg_dump and pg_upgrade, but now
it seems strange.

~
1b.
/cluter/cluster/

~
1c
/because it/because pg_resetwal/

======
src/sgml/ref/pgupgrade.sgml

+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> try to dump and restore logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
+

2a.
/try/attempts to/ ??

~
2b.
Is "dump" the right word here? I didn't see dumping happening in the
patch anymore.

~~~

+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>

3a.
That link made no sense in this context.

Don't you mean to say:
<command>ALTER SUBSCRIPTION ... CONNECTION ...</command>

3b.
Hmm. I wonder now did you *also* mean to describe how to disable? For example:

Before you start upgrading the publisher node, ensure that the
subscription is temporarily disabled, by executing
<link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
DISABLE</command></link>.

~~~

+
+    <para>
+     Upgrading slots has some settings. At first, all the slots must not be in
+     <literal>lost</literal>, and they must have consumed all the WALs on old
+     node. Furthermore, new node must have larger
+     <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+     than existing slots on old node, and
+     <link linkend="guc-wal-level"><varname>wal_level</varname></link> must be
+     <literal>logical</literal>. <application>pg_upgrade</application> will
+     run error if something wrong.
+    </para>
+   </step>
+

4a.
"At first, all the slots must not be in lost"

Apart from being strangely worded, I was not familiar with what it
meant to say "must not be in lost". Will this be meaningful to the
user?

IMO this should have more description, e.g. including mentioning the
"wal_status" attribute with the appropriate link to
https://www.postgresql.org/docs/current/view-pg-replication-slots.html

4b.
BEFORE
Upgrading slots has some settings. ...
<application>pg_upgrade</application> will run error if something
wrong.

SUGGESTION
There are some prerequisites for <application>pg_upgrade</application>
to be able to upgrade the replication slots. If these are not met an
error will be reported.

4c.
Wondered if this list of prerequisites might be better presented as an
SGML list.

======
src/bin/pg_upgrade/check.c

5.
extern char *output_files[];
+extern int num_slots_on_old_cluster;

IMO something feels not quite right about having this counter floating
around as a global variable.

Shouldn't this instead be a field member of the old_cluster. That
seems to be the normal way to hold the cluster-wise info.

~~~

6. check_new_cluster_is_empty

  RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
+ DbInfo     *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+ LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;

IIRC I previously suggested adding this 'pDbInfo' variable because
there are several places that can make use of it.

You are using it only in the NEW code, but did not replace the
existing other code to make use of it:
pg_fatal("New cluster database \"%s\" is not empty: found relation \"%s.%s\"",
new_cluster.dbarr.dbs[dbnum].db_name,

~~~

7. check_for_logical_replication_slots

+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+ PGresult   *res;
+ PGconn    *conn = connectToServer(new_cluster, "template1");
+ int max_replication_slots;
+ char    *wal_level;
+
+ /* logical replication slots can be dumped since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
+
+ prep_status("Checking parameter settings for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
+ else if (num_slots_on_old_cluster > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than existing logical "
+ "replication slots on old node.");
+
+ PQclear(res);
+
+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);
+
+ PQclear(res);
+
+ PQfinish(conn);
+
+ check_ok();

7a.
+check_for_logical_replication_slots(ClusterInfo *new_cluster)

IMO it is bad practice to name this argument 'new_cluster'. You will
end up shadowing the global variable of the same name. It seems in
other similar code where &new_cluster is passed as a parameter the
function arg there is called just 'cluster'.

7b.
"/* logical replication slots can be dumped since PG17. */"

Is "dumped" the correct word to be used here? Where is the "dump"?

7c.

+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
+ else if (num_slots_on_old_cluster > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than existing logical "
+ "replication slots on old node.");

Why is the 1st condition here even needed? Isn't it sufficient just to
have that 2nd condition to check max_replication_slot is big enough?

======

8. src/bin/pg_upgrade/dump.c

{
char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
~

Unnecessary whitespace change.

======
src/bin/pg_upgrade/function.c

9. get_loadable_libraries -- GENERAL

@@ -46,7 +46,8 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- * Fetch the names of all old libraries containing C-language functions.
+ * Fetch the names of all old libraries containing C-language functions, and
+ * output plugins used by existing logical replication slots.
  * We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +67,21 @@ get_loadable_libraries(void)
  PGconn    *conn = connectToServer(&old_cluster, active_db->db_name);

  /*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions and
+ * output plugins in this DB.
  */
  ress[dbnum] = executeQueryOrDie(conn,
  "SELECT DISTINCT probin "
  "FROM pg_catalog.pg_proc "
  "WHERE prolang = %u AND "
  "probin IS NOT NULL AND "
- "oid >= %u;",
+ "oid >= %u "
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
  ClanguageId,
  FirstNormalObjectId);
  totaltups += PQntuples(ress[dbnum]);

Maybe it is OK, but it somehow seems like the new logic has been
jammed into the get_loadable_libraries() function for coding
convenience. For example, all the names (function names, variable
names, structure field names) are referring to "libraries", so the
plugin seems a bit out of place.

~~~

10. get_loadable_libraries

/* Fetch all library names, removing duplicates within each DB */
for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
~

This code comment still refers only to library names.

~~~
10. get_loadable_libraries

+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",

IMO this SQL might be more readable if it uses an alias (like 'rs')
for the catalog. Then rs.wal_status, rs.database, rs.temporary etc.

======
src/bin/pg_upgrade/info.c

11. get_logical_slot_infos_per_db

+ snprintf(query, sizeof(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status <> 'lost';");

There was similar SQL in get_loadable_libraries() but there you wrote:

+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",

The WHERE condition order and case are all slightly different. IMO it
would be better for both SQL fragments to be exactly the same.

~~~

12. get_logical_slot_infos

+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+

I think 'slotnum' is not a good name. In other nearby code (e.g.
print_slot_infos) 'slotnum' is used to mean the index of each slot,
but here it means the total number of slots. How about a name like
'slot_count' or 'nslots' something where the name is more meaningful?

~~~

13. free_db_and_rel_infos

+
+ /*
+ * Logical replication slots must not exist on the new cluster before
+ * doing create_logical_replication_slots().
+ */
+ Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);

Isn't it more natural to do: Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);

======
src/bin/pg_upgrade/pg_upgrade.c

14. create_logical_replication_slots

+create_logical_replication_slots(void)
+{
+ int dbnum;
+ int slotnum;

The 'slotnum' can be declared at a lower scope than this to be closer
to where it is actually used.

~~~

15. create_logical_replication_slots

+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
+ LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+ PQExpBuffer query,
+ escaped;
+ PGconn    *conn;
+ char log_file_name[MAXPGPATH];
+
+ /* Quick exit if there are no slots */
+ if (!slot_arr->nslots)
+ continue;

The comment is misleading. There is no exiting. Maybe better to say
something like "Skip this DB if there are no slots".

~~~

16. create_logical_replication_slots

+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");

I noticed that the function comment for appendStringLiteral() says:
"We need it in situations where we do not have a PGconn available.
Where we do, appendStringLiteralConn is a better choice.".

But in this code, we *do* have PGconn available. So, shouldn't we be
following the advice of the appendStringLiteral() function comment and
use the other API instead?

~~~

17. create_logical_replication_slots

+ /*
+ * The string must be escaped to shell-style, because there is a
+ * possibility that output plugin name contains quotes. The output
+ * string would be sandwiched by the single quotes, so it does not have
+ * to be wrapped by any quotes when it is passed to
+ * parallel_exec_prog().
+ */
+ appendShellString(escaped, query->data);

/sandwiched by/enclosed by/ ???

======
src/bin/pg_upgrade/pg_upgrade.h

18. LogicalSlotInfo

+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* Can the slot decode 2PC? */
+} LogicalSlotInfo;

Looks a bit strange when the only last field comment is uppercase but
the others are not. Maybe lowercase everything like for other nearby
structs.

~~~

19. LogicalSlotInfoArr

+
+typedef struct
+{
+ int nslots;
+ LogicalSlotInfo *slots;
+ int encoding;
+ bool std_strings;
+} LogicalSlotInfoArr;
+

The meaning of those fields is not always obvious. IMO they can all be
commented on.

======
.../pg_upgrade/t/003_logical_replication_slots.pl

20.

# Cause a failure at the start of pg_upgrade because wal_level is replica

I wondered if it would be clearer if you had to explicitly set the
new_node to "replica" initially, instead of leaving it default.

~~~

21.

# Cause a failure at the start of pg_upgrade because max_replication_slots is 0

This related to my earlier code comment in this post -- I didn't
understand the need to specially test for 0. IIUC, we really are
interested only to know if there are *sufficient*
max_replication_slots.

~~~

22.

'run of pg_upgrade of old node with small max_replication_slots');

SUGGESTION
run of pg_upgrade where the new node has insufficient max_replication_slots

~~~

23.

# Preparations for the subsequent test. max_replication_slots is set to
# appropriate value
$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");

# Remove an unnecessary slot and consume WALs
$old_node->start;
$old_node->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot1');
SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
]);
$old_node->stop;

Some of that preparation seems unnecessary. I think the new node
max_replication_slots is 1 already, so if you are going to remove one
of test_slot1 here then there is only ONE slot left, right? So the
max_replication_slots on the new node should be OK now. Not only will
there be less test code needed here, but you will be testing the
boundary condition of max_replication_slots (which is probably a good
thing to do).

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#118

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#115)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I updated the patch to allow parallel executions. Workers are launched per slots,
each one connects to the new node via psql and executes

pg_create_logical_replication_slot().

Will it be beneficial for slots? Invoking a separate process each time
could be more costlier than slot creation. The other thing is during
slot creation, the snapbuild waits for parallel transactions to finish
so that can also hurt the patch. I think we can test it by having 50,
100, or 500 slots on the old cluster and see if doing parallel
execution for the creation of those on the new cluster has any benefit
over serial execution.

Indeed. I have tested based on the comment and found that serial execution was
faster. PSA graphs and tables. The x-axis shows the number of upgraded slots,
y-axis shows the execution time. The parallelism of pg_upgrade (-j) was also
varied during the test.

I've planned to revert the change in upcoming versions.

# compared source code

For parallel execution case, the v21 patch set was used.
For serial execution case, logics in create_logical_replication_slots() are changed,
which is basically same as v20 (I can share if needed).

Moreover, in both cases, debug logs for measuring time were added.

# method

PSA the script. Some given number of slots are created and then pg_upgrade was executed.

# consideration

* In any conditions, the serial execution was faster than parallel. Maybe the
launching process was more costly than I expected.
* Another reason I thougth was that in case of serial execution, the connection
to new node was established only once. Parallel case, however, workers must
establish connections every time. IIUC this requires long duration.
* (very trivial) Number of workers were not affected in serial execution. This
means the coding seems right.

* Added checks for output plugin libraries. pg_upgrade ensures that plugins
referred by old slots were installed to the new executable directory.

I think this is a good idea but did you test it with out-of-core
plugins, if so, can you please share the results? Also, let's update
this information in docs as well.

I have not used other plugins, but forcibly renamed the shared object file.
I would test by plugins like wal2json[1]https://github.com/eulerto/wal2json if more cases are needed.

1. created logical replication slots on old node
SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
2. stopped the old nde
3. forcibly renamed the so file. I used following script:
sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
4. executed pg_upgrade and failed. Outputs what I got were:

```
Checking for presence of required libraries fatal

Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file:
data_N3/pg_upgrade_output.d/20230817T100926.979/loadable_libraries.txt
Failure, exiting
```

And contents of loadable_libraries.txt were below:

```
could not load library "test_decoding": ERROR: could not access file "test_decoding": No such file or directory
In database: postgres
```

[1]: https://github.com/eulerto/wal2json

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#119

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#110)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

This feature only targets logical replication slots. I don't see a
reason to be different for manually created logical replication slots.
Is there something particular that you think we could be missing?

Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check. After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes. Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 170000)
+ return;
...
}
If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.
I think that without 0001 patch the replication slots will not be able
to pass the confirmed_flush_lsn check.
Right, but we can think of backpatching the same. Anyway, we can do
that as a separate work by starting a new thread to see if there is a
broader agreement for backpatching such a change. For now, we can
focus on >=v17.

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#120

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#119)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

This feature only targets logical replication slots. I don't see a
reason to be different for manually created logical replication slots.
Is there something particular that you think we could be missing?

Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check.

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol? Also, I feel it is the plugin author's responsibility to
consume changes or advance slot to the required position before
shutdown.

After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes.

Can't one keep the backup before removing slots?

Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.

I think one is allowed to use pgoutput plugin even for manually
created slots. So, such a check may not work.

--
With Regards,
Amit Kapila.

#121

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#118)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 17, 2023 at 3:48 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

* Added checks for output plugin libraries. pg_upgrade ensures that plugins
referred by old slots were installed to the new executable directory.

I think this is a good idea but did you test it with out-of-core
plugins, if so, can you please share the results? Also, let's update
this information in docs as well.

I have not used other plugins, but forcibly renamed the shared object file.
I would test by plugins like wal2json[1] if more cases are needed.

1. created logical replication slots on old node
SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
2. stopped the old nde
3. forcibly renamed the so file. I used following script:
sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
4. executed pg_upgrade and failed. Outputs what I got were:

```
Checking for presence of required libraries fatal

Your test sounds reasonable but there is no harm in testing wal2json
or some other plugin just to mimic the actual production scenario.
Additionally, it would give us better coverage for the patch by
testing out-of-core plugins for some other tests as well.

--
With Regards,
Amit Kapila.

#122

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#117)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 17, 2023 at 2:10 PM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some review comments for the first 2 patches.

3.
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
3a.
That link made no sense in this context.

Don't you mean to say:
<command>ALTER SUBSCRIPTION ... CONNECTION ...</command>

I think the command is correct here but the wording should mention
about disabling the subscription.

/*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions and
+ * output plugins in this DB.
*/
ress[dbnum] = executeQueryOrDie(conn,
"SELECT DISTINCT probin "
"FROM pg_catalog.pg_proc "
"WHERE prolang = %u AND "
"probin IS NOT NULL AND "
- "oid >= %u;",
+ "oid >= %u "
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
ClanguageId,
FirstNormalObjectId);
totaltups += PQntuples(ress[dbnum]);
~

Maybe it is OK, but it somehow seems like the new logic has been
jammed into the get_loadable_libraries() function for coding
convenience. For example, all the names (function names, variable
names, structure field names) are referring to "libraries", so the
plugin seems a bit out of place.

But the same name library (as plugin) should exist for the upgrade of
slots. I feel doing it separately could either lead to a redundant
code or a different way to achieve the same thing. Do you envision any
problem which we are not seeing?

~~~
10. get_loadable_libraries
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
IMO this SQL might be more readable if it uses an alias (like 'rs')
for the catalog. Then rs.wal_status, rs.database, rs.temporary etc.

Then it will become inconsistent with the existing query which doesn't
use any alias. So, I think we should either change the existing query
to use an alias or not use it at all as the patch does. I would prefer
later.

16. create_logical_replication_slots
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");
I noticed that the function comment for appendStringLiteral() says:
"We need it in situations where we do not have a PGconn available.
Where we do, appendStringLiteralConn is a better choice.".

But in this code, we *do* have PGconn available. So, shouldn't we be
following the advice of the appendStringLiteral() function comment and
use the other API instead?

I think that will avoid maintaining encoding and std_strings in the
slot's array. So, this sounds like a good idea to me.

--
With Regards,
Amit Kapila.

#123

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#121)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I have not used other plugins, but forcibly renamed the shared object file.
I would test by plugins like wal2json[1] if more cases are needed.

1. created logical replication slots on old node
SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
2. stopped the old nde
3. forcibly renamed the so file. I used following script:
sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
4. executed pg_upgrade and failed. Outputs what I got were:

```
Checking for presence of required libraries fatal

Your test sounds reasonable but there is no harm in testing wal2json
or some other plugin just to mimic the actual production scenario.
Additionally, it would give us better coverage for the patch by
testing out-of-core plugins for some other tests as well.

I've tested by using wal2json, decoder_raw[1]https://github.com/michaelpq/pg_plugins, and my small decoder. The results were
the same: pg_upgrade correctly raised an ERROR. Following demo shows the case for wal2json.

In this test, the plugin was installed only on the old node and a slot was created.
Below shows the created slot:

```
(Old)=# SELECT slot_name, plugin FROM pg_replication_slots
slot_name | plugin
-----------+----------
test | wal2json
(1 row)
```

And I confirmed that the plugin worked well via pg_logical_slot_get_changes()
(This was needed to move forward the confirmed_flush_lsn)

```
(Old)=# INSERT INTO foo VALUES (1)
INSERT 0 1
(Old)=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
lsn | xid | data

----------+-----+-------------------------------------------------------------------------------------------------------------
---------------------
0/63C8A8 | 731 | {"change":[{"kind":"insert","schema":"public","table":"foo","columnnames":["id"],"columntypes":["integer"],"
columnvalues":[1]https://github.com/michaelpq/pg_plugins}]}
(1 row)
```

Then the pg_upgrade was executed but failed, same as the previous example.

```
Checking for presence of required libraries fatal

Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file:
data_N3/pg_upgrade_output.d/20230818T030006.675/loadable_libraries.txt
Failure, exiting
```

In the loadable_libraries.txt, it mentioned that wal2json was not installed to new directory.

```
could not load library "wal2json": ERROR: could not access file "wal2json": No such file or directory
In database: postgres
```

Note that upgrade was done if the plugin was installed to new binary too.

Acknowledgement: Thank you Michael and Euler for creating great plugins!

[1]: https://github.com/michaelpq/pg_plugins

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#124

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#113)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for the patch v21-0003

======
Commit message

1.
pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

Maybe some minor brush-up like:

SUGGESTION
In order to prevent data loss, pg_upgrade will fail if the old node
has slots with the status 'lost', or with unconsumed WAL records.

======
src/bin/pg_upgrade/check.c

2. check_for_confirmed_flush_lsn

+ /* Check that all logical slots are not in 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" is obsolete.",
+    PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+
+ if (is_error)
+ pg_fatal("logical replication slots not to be in 'lost' state.");
+

2a. (GENERAL)
The above code for checking lost state seems out of place in this
function which is meant for checking confirmed flush lsn.

Maybe you jammed both kinds of logic into one function to save on the
extra PGconn or something but IMO two separate functions would be
better. e.g.
- check_for_lost_slots
- check_for_confirmed_flush_lsn

2b.
+ /* Check that all logical slots are not in 'lost' state. */

SUGGESTION
/* Check there are no logical replication slots with a 'lost' state. */

2c.
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");

This SQL fragment is very much like others in previous patches. Be
sure to make all the cases and clauses consistent with all those
similar SQL fragments.

2d.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

2e.
There is a mix of terms in the WARNING and in the pg_fatal -- e.g.
"obsolete" versus "lost". Is it OK?

2f.
+ pg_fatal("logical replication slots not to be in 'lost' state.");

English? And maybe it should be much more verbose...

"Upgrade of this installation is not allowed because one or more
logical replication slots with a state of 'lost' were detected."

~~~

3. check_for_confirmed_flush_lsn

+ /*
+ * Check that all logical replication slots have reached the latest
+ * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot be
+ * done in case of live_check because the server has not been written the
+ * SHUTDOWN_CHECKPOINT record yet.
+ */
+ if (!live_check)
+ {
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
+ old_cluster.controldata.chkpnt_latest_upper,
+ old_cluster.controldata.chkpnt_latest_lower);
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+    PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+ PQfinish(conn);
+
+ if (is_error)
+ pg_fatal("All logical replication slots consumed all the WALs.");

3a.
/This checks/This check/

3b.
I don't think the separation of
chkpnt_latest_upper/chkpnt_latest_lower is needed like this. AFAIK
there is an LSN_FORMAT_ARGS(lsn) macro designed for handling exactly
this kind of parameter substitution.

3c.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

3d.
+ pg_fatal("All logical replication slots consumed all the WALs.");

The message seems backward. shouldn't it say something like:
"Upgrade of this installation is not allowed because one or more
logical replication slots still have unconsumed WAL records."

======
src/bin/pg_upgrade/controldata.c

4. get_control_data

+ /*
+ * Upper and lower part of LSN must be read and stored
+ * separately because it is reported as %X/%X format.
+ */
+ cluster->controldata.chkpnt_latest_upper =
+ strtoul(p, &slash, 16);
+ cluster->controldata.chkpnt_latest_lower =
+ strtoul(++slash, NULL, 16);

I felt that this field separation code is maybe not necessary. Please
refer to other review comments in this post.

======
src/bin/pg_upgrade/pg_upgrade.h

5. ControlData

+
+ uint32 chkpnt_latest_upper;
+ uint32 chkpnt_latest_lower;
 } ControlData;

Actually, I did not recognise the reason why this cannot be stored
properly as a single XLogRecPtr field. Please see other review
comments in this post.

======
.../t/003_logical_replication_slots.pl

6. GENERAL

Many of the changes to this file are just renaming the
'old_node'/'new_node' to 'old_publisher'/'new_publisher'.

This seems a basic change not really associated with this patch 0003.
To reduce the code churn, this change should be moved into the earlier
patch where this test file (003_logical_replication_slots.pl) was
first introduced,

~~~

# Cause a failure at the start of pg_upgrade because slot do not finish
# consuming all the WALs

Can you give a more detailed explanation in the comment of how this
test case achieves what it says?

======
src/test/regress/sql/misc_functions.sql

8.
@@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
   FROM pg_split_walfile_name('000000010000000100000000');
 SELECT segment_number > 0 AS ok_segment_number, timeline_id
-  FROM pg_split_walfile_name('ffffffFF00000001000000af');
+  FROM pg_split_walfile_name('ffffffFF00000001000000af');
\ No newline at end of file

What is this change for? It looks like maybe some accidental
whitespace change happened.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#125

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Amit Kapila (#122)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 18, 2023 at 12:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 17, 2023 at 2:10 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are some review comments for the first 2 patches.
/*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions and
+ * output plugins in this DB.
*/
ress[dbnum] = executeQueryOrDie(conn,
"SELECT DISTINCT probin "
"FROM pg_catalog.pg_proc "
"WHERE prolang = %u AND "
"probin IS NOT NULL AND "
- "oid >= %u;",
+ "oid >= %u "
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
ClanguageId,
FirstNormalObjectId);
totaltups += PQntuples(ress[dbnum]);
~

Maybe it is OK, but it somehow seems like the new logic has been
jammed into the get_loadable_libraries() function for coding
convenience. For example, all the names (function names, variable
names, structure field names) are referring to "libraries", so the
plugin seems a bit out of place.
But the same name library (as plugin) should exist for the upgrade of
slots. I feel doing it separately could either lead to a redundant
code or a different way to achieve the same thing. Do you envision any
problem which we are not seeing?

No problem. I'd misunderstood that the "plugin" referred to here is a
shared object file (aka library) name, so it does belong here after
all. I think the new comments could be made more clear about this
point though.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#126

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#116)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I was thinking whether we can go a step ahead and remove this variable
altogether. In old cluster handling, we can get and check together at
the same place and for the new cluster, if we have a function that
returns slot_count by traversing old clusterinfo that should be
sufficient. If you have other better ideas to eliminate this variable
that is also fine. I think this will make the patch bit clean w.r.t
this new variable.

Seems better, removed the variable. Also, the timing of checks were changed
to the end of get_logical_slot_infos(). The check whether we are in live_check
are moved to the function, so the argument was removed again.

The whole of changes can be checked in upcoming e-mail.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#127

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#114)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thank you for reviewing!

+static void
+create_logical_replication_slots(void)
...
+		query = createPQExpBuffer();
+		escaped = createPQExpBuffer();
+		conn = connectToServer(&new_cluster, old_db->db_name);

Since the connection here is not used anymore, so I think we can remove it.

Per discussion [1]/messages/by-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA@TYCPR01MB5870.jpnprd01.prod.outlook.com, pg_upgrade must use connection again. So I kept it.

+static void
+create_logical_replication_slots(void)
...
+	/* update new_cluster info again */
+	get_logical_slot_infos(&new_cluster);
+}

Do we need to get new slots again after restoring ?

I checked again and thought that it was not needed, removed.
Similar function, create_new_objects(), was updated the information at the end.
This was needed because the information was used to compare objects between
old and new cluster, in transfer_all_new_tablespaces(). In terms of logical replication
slots, however, such comparison was not done. No functions use updated information.

3.
+	snprintf(query, sizeof(query),
+			 "SELECT slot_name, plugin, two_phase "
+			 "FROM pg_catalog.pg_replication_slots "
+			 "WHERE database = current_database() AND
temporary = false "
+			 "AND wal_status <> 'lost';");
+
+	res = executeQueryOrDie(conn, "%s", query);
+
Instead of building the query in a new variable, can we directly put the SQL in
executeQueryOrDie()
e.g.
executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase ...");

Right, fixed.

4.
+int num_slots_on_old_cluster;

Instead of a new global variable, would it be better to record this in the cluster
info ?

Per suggestion [2]/messages/by-id/TYAPR01MB5866691219B9CB280B709600F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com, the variable was removed.

5.

char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];

There is an extra change here.

Removed.

6.
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
..
+		/* reap all children */
+		while (reap_child(true) == true)
+			;
+	}
Maybe we can move the "while (reap_child(true) == true)" out of the for() loop ?

Per discussion [1]/messages/by-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA@TYCPR01MB5870.jpnprd01.prod.outlook.com, I stopped to do in parallel. So this part was not needed anymore.

The patch would be available in upcoming posts.

[1]: /messages/by-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA@TYCPR01MB5870.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYAPR01MB5866691219B9CB280B709600F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#128

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#115)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Few minor comments
1. Why the patch updates the slots info at the end of
create_logical_replication_slots()? Can you please update the comments
for the same?

I checked and agreed that it was not needed. More detail, please see [1]/messages/by-id/TYAPR01MB5866F384AC62E12E9638BEC1F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

2.
@@ -36,6 +36,7 @@ generate_old_dump(void)
{
char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];

Spurious line change.

Removed.

Next patch set would be available in upcoming posts.

[1]: /messages/by-id/TYAPR01MB5866F384AC62E12E9638BEC1F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#129

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#117)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing!

For patch v21-0001...

======
1. SaveSlotToPath
- /* and don't do anything if there's nothing to write */
- if (!was_dirty)
+ /*
+ * and don't do anything if there's nothing to write, unless it's this is
+ * called for a logical slot during a shutdown checkpoint, as we want to
+ * persist the confirmed_flush_lsn in that case, even if that's the only
+ * modification.
+ */
+ if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
return;
The condition seems to be coded in a slightly awkward way when
compared to how the comment was worded.

How about:
if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))

Changed.

For patch v21-0002...

======
Commit Message

1.

For pg_upgrade, it query the logical replication slots information from the old
cluter and restores the slots using the pg_create_logical_replication_slots()
statements. Note that we need to separate the timing of restoring replication
slots and other objects. Replication slots, in particular, should not be
restored before executing the pg_resetwal command because it will remove WALs
that are required by the slots.

~

Revisit this paragraph. There are lots of typos etc.

Maybe I sent the patch before finalizing the commit message. Sorry for that.
I reworded the part. Grammarly says OK the new part.

1a.
"For pg_upgrade". I think this wording is a hangover from back when
the patch was split into two parts for pg_dump and pg_upgrade, but now
it seems strange.

Yeah, so removed the word.

1b.
/cluter/cluster/

Changed.

1c
/because it/because pg_resetwal/

Changed.

src/sgml/ref/pgupgrade.sgml

+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> try to dump and restore logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
+

2a.
/try/attempts to/ ??

Changed.

2b.
Is "dump" the right word here? I didn't see dumping happening in the
patch anymore.

I replaced "dump and restore" to " migrate". How do you think?

3.
+    <para>
+     Before you start upgrading the publisher node, ensure that the
+     subscription is temporarily disabled. After the upgrade is complete,
+     execute the
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>
+     command to update the connection string, and then re-enable the
+     subscription.
+    </para>
3a.
That link made no sense in this context.

Don't you mean to say:
<command>ALTER SUBSCRIPTION ... CONNECTION ...</command>
3b.
Hmm. I wonder now did you *also* mean to describe how to disable? For example:

Before you start upgrading the publisher node, ensure that the
subscription is temporarily disabled, by executing
<link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
DISABLE</command></link>.

I wondered which statement should be referred, and finally did incompletely.
Both of ALTER SUBSCRIPTION statements was cited, and a link was added to
DISABLE clause. Is it OK?

4.
+
+    <para>
+     Upgrading slots has some settings. At first, all the slots must not be in
+     <literal>lost</literal>, and they must have consumed all the WALs on old
+     node. Furthermore, new node must have larger
+     <link
linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
me></link>
+     than existing slots on old node, and
+     <link linkend="guc-wal-level"><varname>wal_level</varname></link>
must be
+     <literal>logical</literal>. <application>pg_upgrade</application> will
+     run error if something wrong.
+    </para>
+   </step>
+
4a.
"At first, all the slots must not be in lost"

Apart from being strangely worded, I was not familiar with what it
meant to say "must not be in lost". Will this be meaningful to the
user?

IMO this should have more description, e.g. including mentioning the
"wal_status" attribute with the appropriate link to
https://www.postgresql.org/docs/current/view-pg-replication-slots.html

Added the reference.

4b.
BEFORE
Upgrading slots has some settings. ...
<application>pg_upgrade</application> will run error if something
wrong.

SUGGESTION
There are some prerequisites for <application>pg_upgrade</application>
to be able to upgrade the replication slots. If these are not met an
error will be reported.

Changed.

4c.
Wondered if this list of prerequisites might be better presented as an
SGML list.

Changed to <itemizedlist> style.

src/bin/pg_upgrade/check.c

5.
extern char *output_files[];
+extern int num_slots_on_old_cluster;

~

IMO something feels not quite right about having this counter floating
around as a global variable.

Shouldn't this instead be a field member of the old_cluster. That
seems to be the normal way to hold the cluster-wise info.

Per comment from Amit, the variable was removed.

6. check_new_cluster_is_empty
RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
+ DbInfo     *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+ LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
IIRC I previously suggested adding this 'pDbInfo' variable because
there are several places that can make use of it.

You are using it only in the NEW code, but did not replace the
existing other code to make use of it:
pg_fatal("New cluster database \"%s\" is not empty: found relation \"%s.%s\"",
new_cluster.dbarr.dbs[dbnum].db_name,

Right, switched to use it. Additionally, it was also used for definition of rel_arr.

7. check_for_logical_replication_slots

+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+ PGresult   *res;
+ PGconn    *conn = connectToServer(new_cluster, "template1");
+ int max_replication_slots;
+ char    *wal_level;
+
+ /* logical replication slots can be dumped since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
+
+ prep_status("Checking parameter settings for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
+ else if (num_slots_on_old_cluster > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than existing logical "
+ "replication slots on old node.");
+
+ PQclear(res);
+
+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);
+
+ PQclear(res);
+
+ PQfinish(conn);
+
+ check_ok();

7a.
+check_for_logical_replication_slots(ClusterInfo *new_cluster)

Hmm, but check_for_new_tablespace_dir() has an argument 'new_cluster',
AFAICS, the check function only called for new cluster has an argument "new_cluster",
whereas the function called for both or old cluster has "cluster". Am I missing
something, or anyway it should be fixed? Currently I kept it.

7b.
"/* logical replication slots can be dumped since PG17. */"

Is "dumped" the correct word to be used here? Where is the "dump"?

Changed to "migrated"

7c.
+ if (max_replication_slots == 0)
+ pg_fatal("max_replication_slots must be greater than 0");
+ else if (num_slots_on_old_cluster > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than existing logical "
+ "replication slots on old node.");
Why is the 1st condition here even needed? Isn't it sufficient just to
have that 2nd condition to check max_replication_slot is big enough?

Yeah, sufficient. This is a garbage of previous changes. Fixed.

8. src/bin/pg_upgrade/dump.c

{
char sql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
~

Removed.

======
src/bin/pg_upgrade/function.c

9. get_loadable_libraries -- GENERAL

@@ -46,7 +46,8 @@ library_name_compare(const void *p1, const void *p2)
/*
* get_loadable_libraries()
*
- * Fetch the names of all old libraries containing C-language functions.
+ * Fetch the names of all old libraries containing C-language functions, and
+ * output plugins used by existing logical replication slots.
* We will later check that they all exist in the new installation.
*/
void
@@ -66,14 +67,21 @@ get_loadable_libraries(void)
PGconn    *conn = connectToServer(&old_cluster, active_db->db_name);

/*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions and
+ * output plugins in this DB.
*/
ress[dbnum] = executeQueryOrDie(conn,
"SELECT DISTINCT probin "
"FROM pg_catalog.pg_proc "
"WHERE prolang = %u AND "
"probin IS NOT NULL AND "
- "oid >= %u;",
+ "oid >= %u "
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
ClanguageId,
FirstNormalObjectId);
totaltups += PQntuples(ress[dbnum]);

Per discussion with Amit and you [1]/messages/by-id/CAA4eK1LhEwxQmK2ZepYTYDOKp6F8JCFbiBcw5EoQFbs-CjmY7Q@mail.gmail.com, I kept the style. Comments atop and in
the function was changed instead.

10. get_loadable_libraries

/* Fetch all library names, removing duplicates within each DB */
for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
~

This code comment still refers only to library names.

I think this is right, because output plugins are also the library.

10. get_loadable_libraries
+ "UNION "
+ "SELECT DISTINCT plugin "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
IMO this SQL might be more readable if it uses an alias (like 'rs')
for the catalog. Then rs.wal_status, rs.database, rs.temporary etc.

Per discussion with Amit and you [1]/messages/by-id/CAA4eK1LhEwxQmK2ZepYTYDOKp6F8JCFbiBcw5EoQFbs-CjmY7Q@mail.gmail.com, this comment was ignored.

src/bin/pg_upgrade/info.c

11. get_logical_slot_infos_per_db
+ snprintf(query, sizeof(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status <> 'lost';");
There was similar SQL in get_loadable_libraries() but there you wrote:
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;",
The WHERE condition order and case are all slightly different. IMO it
would be better for both SQL fragments to be exactly the same.

Unified to later one.

12. get_logical_slot_infos
+int
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
I think 'slotnum' is not a good name. In other nearby code (e.g.
print_slot_infos) 'slotnum' is used to mean the index of each slot,
but here it means the total number of slots. How about a name like
'slot_count' or 'nslots' something where the name is more meaningful?

Changed to slot_count.

13. free_db_and_rel_infos
+
+ /*
+ * Logical replication slots must not exist on the new cluster before
+ * doing create_logical_replication_slots().
+ */
+ Assert(db_arr->dbs[dbnum].slot_arr.slots == NULL);
Isn't it more natural to do: Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);

Changed.

src/bin/pg_upgrade/pg_upgrade.c

14. create_logical_replication_slots
+create_logical_replication_slots(void)
+{
+ int dbnum;
+ int slotnum;
The 'slotnum' can be declared at a lower scope than this to be closer
to where it is actually used.

Moved.

15. create_logical_replication_slots
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
+ LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+ PQExpBuffer query,
+ escaped;
+ PGconn    *conn;
+ char log_file_name[MAXPGPATH];
+
+ /* Quick exit if there are no slots */
+ if (!slot_arr->nslots)
+ continue;
The comment is misleading. There is no exiting. Maybe better to say
something like "Skip this DB if there are no slots".

Changed.

16. create_logical_replication_slots
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
+ slot_arr->encoding, slot_arr->std_strings);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");
I noticed that the function comment for appendStringLiteral() says:
"We need it in situations where we do not have a PGconn available.
Where we do, appendStringLiteralConn is a better choice.".

But in this code, we *do* have PGconn available. So, shouldn't we be
following the advice of the appendStringLiteral() function comment and
use the other API instead?

Changed to use appendStringLiteralConn.

17. create_logical_replication_slots

+ /*
+ * The string must be escaped to shell-style, because there is a
+ * possibility that output plugin name contains quotes. The output
+ * string would be sandwiched by the single quotes, so it does not have
+ * to be wrapped by any quotes when it is passed to
+ * parallel_exec_prog().
+ */
+ appendShellString(escaped, query->data);

/sandwiched by/enclosed by/ ???

This part was no longer needed because we do not bypass strings to the
shell. The initial motivation of the change was to execute in parallel, and
the string was escaped with shell-style and pass to psql -c option for the
purpose. But I found that it shows huge performance degradation, so reverted
the change. See my report [2]/messages/by-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA@TYCPR01MB5870.jpnprd01.prod.outlook.com.

src/bin/pg_upgrade/pg_upgrade.h

18. LogicalSlotInfo
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* Can the slot decode 2PC? */
+} LogicalSlotInfo;
Looks a bit strange when the only last field comment is uppercase but
the others are not. Maybe lowercase everything like for other nearby
structs.

Changed.

19. LogicalSlotInfoArr
+
+typedef struct
+{
+ int nslots;
+ LogicalSlotInfo *slots;
+ int encoding;
+ bool std_strings;
+} LogicalSlotInfoArr;
+
The meaning of those fields is not always obvious. IMO they can all be
commented on.

Added. Note that encoding and std_strings were removed because it was
used by appendStringLiteral().

.../pg_upgrade/t/003_logical_replication_slots.pl

20.

# Cause a failure at the start of pg_upgrade because wal_level is replica

~

I wondered if it would be clearer if you had to explicitly set the
new_node to "replica" initially, instead of leaving it default.

Changed.

21.

# Cause a failure at the start of pg_upgrade because max_replication_slots is 0

~

This related to my earlier code comment in this post -- I didn't
understand the need to specially test for 0. IIUC, we really are
interested only to know if there are *sufficient*
max_replication_slots.

Agreed, removed.

22.

'run of pg_upgrade of old node with small max_replication_slots');

~

SUGGESTION
run of pg_upgrade where the new node has insufficient max_replication_slots

Changed.

23.

# Preparations for the subsequent test. max_replication_slots is set to
# appropriate value
$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");

# Remove an unnecessary slot and consume WALs
$old_node->start;
$old_node->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot1');
SELECT count(*) FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL)
]);
$old_node->stop;

~

Some of that preparation seems unnecessary. I think the new node
max_replication_slots is 1 already, so if you are going to remove one
of test_slot1 here then there is only ONE slot left, right? So the
max_replication_slots on the new node should be OK now. Not only will
there be less test code needed here, but you will be testing the
boundary condition of max_replication_slots (which is probably a good
thing to do).

Removed.

Next version would be available in the upcoming post.

[1]: /messages/by-id/CAA4eK1LhEwxQmK2ZepYTYDOKp6F8JCFbiBcw5EoQFbs-CjmY7Q@mail.gmail.com
[2]: /messages/by-id/TYCPR01MB58701DAEE5E61B07AC84ADBBF51AA@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#130

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#124)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

PSA new version patch set.

Here are some review comments for the patch v21-0003

======
Commit message

1.
pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

~

Maybe some minor brush-up like:

SUGGESTION
In order to prevent data loss, pg_upgrade will fail if the old node
has slots with the status 'lost', or with unconsumed WAL records.

Improved.

src/bin/pg_upgrade/check.c

2. check_for_confirmed_flush_lsn
+ /* Check that all logical slots are not in 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" is obsolete.",
+    PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+
+ if (is_error)
+ pg_fatal("logical replication slots not to be in 'lost' state.");
+
2a. (GENERAL)
The above code for checking lost state seems out of place in this
function which is meant for checking confirmed flush lsn.

Maybe you jammed both kinds of logic into one function to save on the
extra PGconn or something but IMO two separate functions would be
better. e.g.
- check_for_lost_slots
- check_for_confirmed_flush_lsn

Separated into check_for_lost_slots and check_for_confirmed_flush_lsn.

2b.
+ /* Check that all logical slots are not in 'lost' state. */

SUGGESTION
/* Check there are no logical replication slots with a 'lost' state. */

Changed.

2c.
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");
This SQL fragment is very much like others in previous patches. Be
sure to make all the cases and clauses consistent with all those
similar SQL fragments.

Unified the order. Note that they could not be the completely the same.

2d.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

Removed the variable.

2e.
There is a mix of terms in the WARNING and in the pg_fatal -- e.g.
"obsolete" versus "lost". Is it OK?

Unified to 'lost'.

2f.
+ pg_fatal("logical replication slots not to be in 'lost' state.");

English? And maybe it should be much more verbose...

"Upgrade of this installation is not allowed because one or more
logical replication slots with a state of 'lost' were detected."

I checked other pg_fatal() and the statement like "Upgrade of this installation is not allowed"
could not be found. So I used later part.

3. check_for_confirmed_flush_lsn

+ /*
+ * Check that all logical replication slots have reached the latest
+ * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot
be
+ * done in case of live_check because the server has not been written the
+ * SHUTDOWN_CHECKPOINT record yet.
+ */
+ if (!live_check)
+ {
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
+ old_cluster.controldata.chkpnt_latest_upper,
+ old_cluster.controldata.chkpnt_latest_lower);
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+    PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+ PQfinish(conn);
+
+ if (is_error)
+ pg_fatal("All logical replication slots consumed all the WALs.");

3a.
/This checks/This check/

The comment was no longer needed, because the caller checks live_check variable.
More detail, please see my another post [1]/messages/by-id/TYAPR01MB5866691219B9CB280B709600F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

3b.
I don't think the separation of
chkpnt_latest_upper/chkpnt_latest_lower is needed like this. AFAIK
there is an LSN_FORMAT_ARGS(lsn) macro designed for handling exactly
this kind of parameter substitution.

Fixed to use the macro.

Previously I considered that the header "access/xlogdefs.h" could not be included
from pg_upgrade, and it was the reason why I did not use. But it seemed my
misunderstanding - I could include the file.

3c.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

Removed.

3d.
+ pg_fatal("All logical replication slots consumed all the WALs.");

The message seems backward. shouldn't it say something like:
"Upgrade of this installation is not allowed because one or more
logical replication slots still have unconsumed WAL records."

I used only later part, see above reply.

src/bin/pg_upgrade/controldata.c

4. get_control_data
+ /*
+ * Upper and lower part of LSN must be read and stored
+ * separately because it is reported as %X/%X format.
+ */
+ cluster->controldata.chkpnt_latest_upper =
+ strtoul(p, &slash, 16);
+ cluster->controldata.chkpnt_latest_lower =
+ strtoul(++slash, NULL, 16);
I felt that this field separation code is maybe not necessary. Please
refer to other review comments in this post.

Hmm. I thought they must be read separately even if we stored as XLogRecPtr (uint64).
This is because the pg_controldata reports the LSN as %X/%X style. Am I missing something?

```
$ pg_controldata -D data_N1/ | grep "Latest checkpoint location"
Latest checkpoint location: 0/153C8D0
```

src/bin/pg_upgrade/pg_upgrade.h

5. ControlData
+
+ uint32 chkpnt_latest_upper;
+ uint32 chkpnt_latest_lower;
} ControlData;
~

Actually, I did not recognise the reason why this cannot be stored
properly as a single XLogRecPtr field. Please see other review
comments in this post.

Changed to use XLogRecPtr. See above comment.

.../t/003_logical_replication_slots.pl

6. GENERAL

Many of the changes to this file are just renaming the
'old_node'/'new_node' to 'old_publisher'/'new_publisher'.

This seems a basic change not really associated with this patch 0003.
To reduce the code churn, this change should be moved into the earlier
patch where this test file (003_logical_replication_slots.pl) was
first introduced,

Moved these renaming to 0002.

7.

# Cause a failure at the start of pg_upgrade because slot do not finish
# consuming all the WALs

~

Can you give a more detailed explanation in the comment of how this
test case achieves what it says?

Slightly reworded above and this comment. How do you think?

src/test/regress/sql/misc_functions.sql

8.
@@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
SELECT segment_number > 0 AS ok_segment_number, timeline_id
FROM pg_split_walfile_name('000000010000000100000000');
SELECT segment_number > 0 AS ok_segment_number, timeline_id
-  FROM pg_split_walfile_name('ffffffFF00000001000000af');
+  FROM pg_split_walfile_name('ffffffFF00000001000000af');
\ No newline at end of file

What is this change for? It looks like maybe some accidental
whitespace change happened.

It was unexpected, removed.

[1]: /messages/by-id/TYAPR01MB5866691219B9CB280B709600F51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v22-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v22-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From dc0f5281f618b293aa7165f9986272f29d4bad60 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v22 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith
Discussion: FIXME
---
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c    | 26 ++++++++++++++++----------
 src/include/replication/slot.h    |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..cc55840d16 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * and don't do anything if there's nothing to write, unless it's this is
+	 * called for a logical slot during a shutdown checkpoint, as we want to
+	 * persist the confirmed_flush_lsn in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v22-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v22-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 617127bbd11a2aad31231f18d53a4c54dbc802e7 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v22 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.

Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei
---
 doc/src/sgml/ref/pgupgrade.sgml               |  56 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  70 +++++++++-
 src/bin/pg_upgrade/function.c                 |  18 ++-
 src/bin/pg_upgrade/info.c                     | 122 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  78 +++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 123 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 487 insertions(+), 6 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..848f7e8432 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -402,6 +402,62 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+     connection string, and then re-enable the subscription.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on old cluster must be usable, i.e., there are no slots which
+       <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugin referred by slots on old cluster must be installed on
+       the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       New cluster must have larger
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       than existing slots on old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       New cluster must be set
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..ed5b07fbb7 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_logical_slot_infos(&old_cluster, live_check);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,17 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/*
+	 * Checking for logical slots must be done before
+	 * check_new_cluster_is_empty() because the slot_arr attribute of the
+	 * new_cluster will be checked in that function.
+	 */
+	if (count_logical_slots(&old_cluster))
+	{
+		get_logical_slot_infos(&new_cluster, false);
+		check_for_logical_replication_slots(&new_cluster);
+	}
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -352,7 +367,9 @@ check_new_cluster_is_empty(void)
 	for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++)
 	{
 		int			relnum;
-		RelInfoArr *rel_arr = &new_cluster.dbarr.dbs[dbnum].rel_arr;
+		DbInfo     *pDbInfo = &new_cluster.dbarr.dbs[dbnum];
+		RelInfoArr *rel_arr = &pDbInfo->rel_arr;
+		LogicalSlotInfoArr *slot_arr = &pDbInfo->slot_arr;
 
 		for (relnum = 0; relnum < rel_arr->nrels;
 			 relnum++)
@@ -360,10 +377,18 @@ check_new_cluster_is_empty(void)
 			/* pg_largeobject and its index should be skipped */
 			if (strcmp(rel_arr->rels[relnum].nspname, "pg_catalog") != 0)
 				pg_fatal("New cluster database \"%s\" is not empty: found relation \"%s.%s\"",
-						 new_cluster.dbarr.dbs[dbnum].db_name,
+						 pDbInfo->db_name,
 						 rel_arr->rels[relnum].nspname,
 						 rel_arr->rels[relnum].relname);
 		}
+
+		/*
+		 * Check the existence of logical replication slots.
+		 */
+		if (slot_arr->nslots)
+			pg_fatal("New cluster database \"%s\" is not empty: found logical replication slot \"%s\"",
+					 pDbInfo->db_name,
+					 slot_arr->slots[0].slotname);
 	}
 }
 
@@ -1402,3 +1427,44 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn = connectToServer(new_cluster, "template1");
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* logical replication slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking parameter settings for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (count_logical_slots(&old_cluster) > max_replication_slots)
+		pg_fatal("max_replication_slots must be greater than existing logical "
+				 "replication slots on old node.");
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..e197d1f043 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +71,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions or referred
+		 * by logical replication slots in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..b8505ae65b 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slot_count += pDbInfo->slot_arr.nslots;
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slotnum = 0;
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+		slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slotnum;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +712,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +768,15 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_arr->slots[slotnum].slotname,
+			   slot_arr->slots[slotnum].plugin,
+			   slot_arr->slots[slotnum].two_phase);
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..5bfd17160b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_logical_slots(&old_cluster))
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,67 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			/*
+			 * Constructs query for creating logical replication slots.
+			 *
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
+									conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
+									conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..6ba2efe1b3 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_logical_slots(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..3df2d3c284
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,123 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old node
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize new node
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+$old_publisher->stop;
+
+# Create a slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# Create an unnecessary slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+]);
+
+$old_publisher->stop;
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# smaller than existing slots on old node
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# Cause a failure at the start of pg_upgrade because the new node has
+# insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new node has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# Remove an unnecessary slot and consume WAL records
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT pg_drop_replication_slot('test_slot2');
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)
+]);
+$old_publisher->stop;
+
+# Actual run, pg_upgrade_output.d is removed at the end
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new node');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 51b7951ad8..0071efef1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v22-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v22-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From c022357ec0b3bae5fc9bd3d2b18262eeba39c2a9 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v22 3/3] pg_upgrade: Add check function for logical
 replication slots

In order to prevent data loss, pg_upgrade will fail if the old node has slots with
the status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith
---
 src/bin/pg_upgrade/check.c                    | 93 +++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              | 34 +++++++
 src/bin/pg_upgrade/info.c                     | 14 +++
 src/bin/pg_upgrade/pg_upgrade.h               |  5 +
 .../t/003_logical_replication_slots.pl        | 69 ++++++++++++--
 5 files changed, 209 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index ed5b07fbb7..859fbb7cdc 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -1468,3 +1469,95 @@ check_for_logical_replication_slots(ClusterInfo *new_cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..30ad6ba1e0 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,40 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Gather latest checkpoint location if the cluster is newer or
+				 * equal to 17. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 17)
+				{
+					char *slash = NULL;
+					uint64 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					/*
+					 * Upper and lower part of LSN must be read separately
+					 * because it is reported as %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										(upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index b8505ae65b..08e002db28 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -684,6 +684,20 @@ get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
 			print_slot_infos(&pDbInfo->slot_arr);
 		}
 	}
+
+	/*
+	 * Do additional checks if slots are found on the old node. If something is
+	 * found on the new node, a subsequent function
+	 * check_new_cluster_is_empty() would report the name of slots and raise a
+	 * fatal error.
+	 */
+	if (cluster == &old_cluster && slot_count)
+	{
+		check_for_lost_slots(cluster);
+
+		if (!live_check)
+			check_for_confirmed_flush_lsn(cluster);
+	}
 }
 
 /*
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 6ba2efe1b3..6df948ac73 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,8 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -366,6 +369,8 @@ void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
 void		create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+void		check_for_lost_slots(ClusterInfo *cluster);
+void		check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 
 /* controldata.c */
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 3df2d3c284..3828307577 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -17,15 +17,16 @@ my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 # Initialize old node
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;
 
 # Initialize new node
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # Create a slot on old node
 $old_publisher->start;
@@ -59,6 +60,7 @@ $old_publisher->start;
 $old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);
 ]);
 
 $old_publisher->stop;
@@ -89,15 +91,57 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 # Clean up
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
-# Remove an unnecessary slot and consume WAL records
+# Remove an unnecessary slot and generate WALs. These records would not be
+# consumed before doing pg_upgrade, so that the upcoming test would fail.
 $old_publisher->start;
 $old_publisher->safe_psql(
 	'postgres', qq[
 	SELECT pg_drop_replication_slot('test_slot2');
-	SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)
+	CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 ]);
 $old_publisher->stop;
 
+# Cause a failure at the start of pg_upgrade because the slot still have
+# unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old node with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->start();
+$old_publisher->safe_psql('postgres', qq[
+	SELECT pg_drop_replication_slot('test_slot1');
+	CREATE PUBLICATION pub FOR ALL TABLES;
+]);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+
+# Wait for initial table sync to finish
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
 # Actual run, pg_upgrade_output.d is removed at the end
 command_ok(
 	[
@@ -118,6 +162,19 @@ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new node');
+is($result, qq(sub|t), 'check the slot exists on new node');
+
+# Change the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
 
 done_testing();
-- 
2.27.0

#131

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#130)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 18, 2023 at 7:21 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Few comments on new patches:
1.
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+     connection string, and then re-enable the subscription.

Why does one need to update the connection string?

2.
+ /*
+ * Checking for logical slots must be done before
+ * check_new_cluster_is_empty() because the slot_arr attribute of the
+ * new_cluster will be checked in that function.
+ */
+ if (count_logical_slots(&old_cluster))
+ {
+ get_logical_slot_infos(&new_cluster, false);
+ check_for_logical_replication_slots(&new_cluster);
+ }
+
  check_new_cluster_is_empty();

Can't we simplify this checking by simply querying
pg_replication_slots for any usable slot something similar to what we
are doing in check_for_prepared_transactions()? We can add this check
in the function check_for_logical_replication_slots(). Also, do we
need a count function, or instead can we have a simple function like
is_logical_slot_present() where we return even if there is one slot
present?

Apart from this, (a) I have made a few changes (changed comments) in
patch 0001 as shared in the email [1]/messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com; (b) some modifications in the
docs as you can see in the attached. Please include those changes in
the next version if you think they are okay.

[1]: /messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com

--
With Regards,
Amit Kapila.

Attachments:

mod_amit_1.patchapplication/octet-stream; name=mod_amit_1.patchDownload

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 848f7e8432..f6bd36ea1c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -429,27 +429,27 @@ NET STOP postgresql-&majorversion;
     <itemizedlist>
      <listitem>
       <para>
-       All slots on old cluster must be usable, i.e., there are no slots which
-       <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
        <xref linkend="view-pg-replication-slots"/>).
       </para>
      </listitem>
      <listitem>
       <para>
-       The output plugin referred by slots on old cluster must be installed on
-       the new PostgreSQL executable directory.
+       The output plugins referenced by the slots on the old cluster must be
+       installed on the new PostgreSQL executable directory.
       </para>
      </listitem>
      <listitem>
       <para>
-       New cluster must have larger
+       The new cluster must have
        <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
-       than existing slots on old cluster.
+       configured to value larger than the existing slots on the old cluster.
       </para>
      </listitem>
      <listitem>
       <para>
-       New cluster must be set
+       The new cluster must have
        <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
        <literal>logical</literal>.
       </para>

#132

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Amit Kapila (#120)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

This feature only targets logical replication slots. I don't see a
reason to be different for manually created logical replication slots.
Is there something particular that you think we could be missing?

Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check.

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol?

Not sure.

Also, I feel it is the plugin author's responsibility to
consume changes or advance slot to the required position before
shutdown.

How does the plugin author ensure that the slot consumes all WAL
records including shutdown_checkpoint before shutdown?

After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes.

Can't one keep the backup before removing slots?

Yes, but restoring the back could take time.

Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.

I think one is allowed to use pgoutput plugin even for manually
created slots. So, such a check may not work.

Right, but I thought it's a very rare case.

Since the slot's flushed_confirmed_lsn check is not a compatibility
check unlike the existing check, I wonder if we can make it optional.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#133

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#130)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 18, 2023 at 10:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

PSA new version patch set.

I've looked at the v22 patch set, and here are some comments:

0001:

Do we need regression tests to make sure that the slot's
confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint
record?

0002:

+   <step>
+    <title>Prepare for publisher upgrades</title>
+

Should this step be done before "8. Stop both servers" as it might
require to disable subscriptions and to drop 'lost' replication slots?

Why is there no explanation about the slots' confirmed_flush_lsn check
as prerequisites?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#134

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#132)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check.

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol?

Not sure.

I think one can use Streaming Replication Protocol to achieve it [1]https://www.postgresql.org/docs/current/protocol-replication.html.

Also, I feel it is the plugin author's responsibility to
consume changes or advance slot to the required position before
shutdown.

How does the plugin author ensure that the slot consumes all WAL
records including shutdown_checkpoint before shutdown?

By using "Streaming Replication Protocol" so that walsender can take
care of it. If not, I think users should drop such slots before the
upgrade because anyway, they won't be usable after the upgrade.

After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes.

Can't one keep the backup before removing slots?

Yes, but restoring the back could take time.

Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.

I think one is allowed to use pgoutput plugin even for manually
created slots. So, such a check may not work.

Right, but I thought it's a very rare case.

Okay, but not sure that we can ignore it.

Since the slot's flushed_confirmed_lsn check is not a compatibility
check unlike the existing check, I wonder if we can make it optional.

There are arguments both ways. Initially, the patch proposed to make
them optional by having an option like
--include-logical-replication-slots but Jonathan raised a point that
it will be more work for users and should be the default. Then we also
discussed having an option like --exclude-logical-replication-slots
but as we don't have any other similar option, it doesn't seem natural
to add such an option. Also, I am afraid, if there is no user of such
an option, it won't be worth it. BTW, how would you like to see it as
an optional (via --include or via --exclude switch)?

Personally, I am okay to make it optional if we have a broader
consensus. My preference would be to have an --exclude kind of option.
How about first getting the main patch reviewed and committed, then
based on consensus, we can decide whether to make it optional and if
so, what is the preferred way?

[1]: https://www.postgresql.org/docs/current/protocol-replication.html

--
With Regards,
Amit Kapila.

#135

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#130)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v22-0002

======
Commit Message

1.
This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.

1a
/is newly extracted/is fetched/

1b.
/using the pg_create_logical_replication_slots()/executing
pg_create_logical_replication_slots()/

1c.
/clushter/cluster/

~~~

2.
Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.

2a.
/it must/slot restoration/

2b.
/the restriction/this restriction/

======
doc/src/sgml/ref/pgupgrade.sgml

3.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>

/same replication slot/same replication slots/

~~~

4.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+     connection string, and then re-enable the subscription.
+    </para>

On the rendered page, it looks a bit strange that DISABLE has a link
but COMMENTION does not have a link.

~~~

5.
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>

+1 to use all the itemizedlist changes that Amit suggested [1]/messages/by-id/CAA4eK1+dT2g8gmerguNd_TA=XMnm00nLzuEJ_Sddw6Pj-bvKVQ@mail.gmail.com in his
attachment.

======
src/bin/pg_upgrade/check.c

6.
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);

IMO the arg name should not shadow a global with the same name. See
other review comment for this function signature.

~~~

7.
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos(&old_cluster, live_check);

But 'live_check' is never used?

~~~

8. check_for_logical_replication_slots
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)

IMO the arg name should not shadow a global with the same name. If
this is never going to be called with any param other than
&new_cluster then probably it is better not even to pass have that
argument at all. Just refer to the global new_cluster inside the
function.

You can't say that 'check_for_new_tablespace_dir' does it already so
it must be OK -- I think that the existing function has the same issue
and it also ought to be fixed to avoid shadowing!

~~~

9. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;

IMO the code matches the comment better if you say < 1700 instead of <= 1600.

======
src/bin/pg_upgrade/function.c

10. get_loadable_libraries
  /*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions or referred
+ * by logical replication slots in this DB.
  */
  ress[dbnum] = executeQueryOrDie(conn,
~

/referred by/referred to by/

======
src/bin/pg_upgrade/info.c

11.
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ get_logical_slot_infos_per_db(cluster, pDbInfo);
+ slot_count += pDbInfo->slot_arr.nslots;
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
+

11a.
Now the variable 'slot_count' is no longer being returned so it seems redundant.

11b.
What is the 'live_check' parameter for? Nobody is using it.

~~~

12. count_logical_slots

+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slotnum;
+}

IMO this variable should be called something like 'slot_count'. This
is the same review comment also made in a previous review. (See [2]/messages/by-id/TYAPR01MB586604802ABE42E11866762FF51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com
comment#12).

~~~

13. print_slot_infos

+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}

It might be nicer to introduce a variable, instead of all those array
dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

~~~

14.
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ /*
+ * Constructs query for creating logical replication slots.
+ *
+ * XXX: For simplification, pg_create_logical_replication_slot() is
+ * used. Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
+ conn);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
+ conn);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+ PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+ resetPQExpBuffer(query);
+ }
+
+ PQfinish(conn);
+
+ destroyPQExpBuffer(query);
+ }
+
+ end_progress_output();
+ check_ok();

14a
Similar to the previous comment (#13). It might be nicer to introduce
a variable, instead of all those array dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
~

14b.
It was not clear to me why this command is not being built using
executeQueryOrDie directly instead of using the query buffer. Is there
some reason?

14c.
I think it would be cleaner to have a separate res variable like you
used elsewhere:
res = executeQueryOrDie(...)

instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
in one line

======
src/bin/pg_upgrade/pg_upgrade.

15.
+void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);

I didn't see a reason for that 'live_check' parameter.

======
.../pg_upgrade/t/003_logical_replication_slots.pl

16.
IMO this would be much easier to read if there were BIG comments
between the actual TEST parts

For example

# ------------------------------
# TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Successful upgrade
<preparation>
<test>
<cleanup>

~~~

17.
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ not removed after pg_upgrade failure");

The message is ambiguous

BEFORE
'run of pg_upgrade of old node with wrong wal_level'

SUGGESTION
'run of pg_upgrade where the new node has the wrong wal_level'

~~~

18.
+# Create an unnecessary slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
+]);
+
+$old_publisher->stop;
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# smaller than existing slots on old node
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");

IMO the comment is misleading. It is not an "unnecessary slot", it is
just a 2nd slot. And this is all part of the preparation for the next
test so it should be under the other comment.

For example SUGGESTION changes like this:

# Preparations for the subsequent test.
# 1. Create an unnecessary slot on the old node
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
]);
$old_publisher->stop;
# 2. max_replication_slots is set to smaller than the number of slots
(2) present on the old node
$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
# 3. new node wal_level is set correctly
$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");

~~~

19.
+# Remove an unnecessary slot and consume WAL records
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_drop_replication_slot('test_slot2');
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)
+]);
+$old_publisher->stop;
+

This comment should say more like:

# Preparations for the subsequent test.

~~~

20.
+# Actual run, pg_upgrade_output.d is removed at the end

This comment should mention that "successful upgrade is expected"
because all the other prerequisites are now satisfied.

~~~

21.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new node');

Should there be a matching new_pulisher->stop;?

------
[1]: /messages/by-id/CAA4eK1+dT2g8gmerguNd_TA=XMnm00nLzuEJ_Sddw6Pj-bvKVQ@mail.gmail.com
[2]: /messages/by-id/TYAPR01MB586604802ABE42E11866762FF51BA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#136

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#134)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Monday, August 21, 2023 11:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Sorry I was not clear. I meant the logical replication slots that
are
*not* used by logical replication, i.e., are created manually and
used by third party tools that periodically consume decoded
changes. As we discussed before, these slots will never be able to
pass that confirmed_flush_lsn check.

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol?

Not sure.

I think one can use Streaming Replication Protocol to achieve it [1].

Also, I feel it is the plugin author's responsibility to consume
changes or advance slot to the required position before shutdown.

How does the plugin author ensure that the slot consumes all WAL
records including shutdown_checkpoint before shutdown?

By using "Streaming Replication Protocol" so that walsender can take care of it.
If not, I think users should drop such slots before the upgrade because anyway,
they won't be usable after the upgrade.

Yes, I think pglogical is one example which start a bgworker(apply worker) on client to
consume changes which also uses Streaming Replication Protocol IIRC. And
pg_recvlogical is another example which connects to walsender and consume changes.

Best Regards,
Hou zj

#137

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#130)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, August 18, 2023 9:52 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

PSA new version patch set.

Thanks for updating the patch!
Here are few comments about 0003 patch.

+check_for_lost_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+ 
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;

I think we should build connection after this check, otherwise the connection
may be left open after returning.

2.
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;

Same as above.

3.
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 17)
+				{

I think you mean 1700 here.

4.
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					/*
+					 * Upper and lower part of LSN must be read separately
+					 * because it is reported as %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);

Maybe we'd better add a sanity check after strpbrk like "if (p == NULL ||
strlen(p) <= 1)" to be consistent with other similar code.

Best Regards,
Hou zj

#138

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#131)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving comments! PSA new version patch set.

1.
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command>
command to update the
+     connection string, and then re-enable the subscription.

Why does one need to update the connection string?

I wrote like that because the old and new port number can be different. But you
are partially right - it is not always needed. Updated to clarify that.

2.
+ /*
+ * Checking for logical slots must be done before
+ * check_new_cluster_is_empty() because the slot_arr attribute of the
+ * new_cluster will be checked in that function.
+ */
+ if (count_logical_slots(&old_cluster))
+ {
+ get_logical_slot_infos(&new_cluster, false);
+ check_for_logical_replication_slots(&new_cluster);
+ }
+
check_new_cluster_is_empty();
Can't we simplify this checking by simply querying
pg_replication_slots for any usable slot something similar to what we
are doing in check_for_prepared_transactions()? We can add this check
in the function check_for_logical_replication_slots().

Some checks were included to check_for_logical_replication_slots(), and
get_logical_slot_infos() for new_cluster was removed as you said.

But get_logical_slot_infos() cannot be removed completely, because the old
cluster has already been shut down when the new cluster is checked. We must
store the information of old cluster on the memory.

Note that the existence of slots are now checked in any cases because such slots
could not be used after the upgrade.

check_new_cluster_is_empty() is no longer checks logical slots, so all changes for
this function was reverted.

Also, do we
need a count function, or instead can we have a simple function like
is_logical_slot_present() where we return even if there is one slot

I think this is still needed, because max_replication_slots and the number
of existing replication slots must be compared.

Of course we can add another simple function like
is_logical_slot_present_on_old_cluster() and use in main(), but not sure defining
some similar functions are good.

Apart from this, (a) I have made a few changes (changed comments) in
patch 0001 as shared in the email [1]; (b) some modifications in the
docs as you can see in the attached. Please include those changes in
the next version if you think they are okay.

I checked and your modification seems nice.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From cba667cf3da39b8cfe9aebfce251e31ea59bf639 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v23 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 contrib/test_decoding/meson.build             |  1 +
 contrib/test_decoding/t/002_always_persist.pl | 74 +++++++++++++++++++
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/replication/slot.c                | 25 ++++---
 src/include/replication/slot.h                |  2 +-
 5 files changed, 92 insertions(+), 12 deletions(-)
 create mode 100644 contrib/test_decoding/t/002_always_persist.pl

diff --git a/contrib/test_decoding/meson.build b/contrib/test_decoding/meson.build
index 7b05cc25a3..12afb9ea8c 100644
--- a/contrib/test_decoding/meson.build
+++ b/contrib/test_decoding/meson.build
@@ -72,6 +72,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_repl_stats.pl',
+      't/002_always_persist.pl',
     ],
   },
 }
diff --git a/contrib/test_decoding/t/002_always_persist.pl b/contrib/test_decoding/t/002_always_persist.pl
new file mode 100644
index 0000000000..cf78953eef
--- /dev/null
+++ b/contrib/test_decoding/t/002_always_persist.pl
@@ -0,0 +1,74 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persist to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test (id int)");
+
+# Create replication slot
+$node->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');"
+);
+
+# Insert some data
+$node->safe_psql('postgres',
+	"INSERT INTO test VALUES (generate_series(1, 5));");
+
+# Consume WAL records
+$node->safe_psql('postgres',
+    "SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);"
+);
+
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{
+	if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+	{
+		$latest_checkpoint = $1;
+		last;
+	}
+}
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);
+
+# Boot the node again and check confirmed_flush_lsn. If the slot has persisted,
+# the LSN becomes same as the latest checkpoint location, which means the
+# SHUTDOWN_CHECKPOINT record. 
+$node->start();
+my $confirmed_flush = $node->safe_psql('postgres',
+	"SELECT confirmed_flush_lsn FROM pg_replication_slots;"
+);
+
+# Compare confirmed_flush_lsn and checkPoint
+ok($confirmed_flush eq $latest_checkpoint,
+	"Check confirmed_flush is same as latest checkpoint location");
+
+# Shutdown
+$node->stop;
+
+done_testing();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..4d1e2d193e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * Don't do anything if there's nothing to write, unless this is called for
+	 * a logical slot during a shutdown checkpoint, as we want to persist the
+	 * confirmed_flush LSN in that case, even if that's the only modification.
+	 */
+	if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v23-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v23-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From a33e9d43de7321bf2777c99f3eda56ac1b579f10 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v23 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  63 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  74 ++++++++++
 src/bin/pg_upgrade/function.c                 |  18 ++-
 src/bin/pg_upgrade/info.c                     | 123 +++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  78 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 517 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..7d7a114f53 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,69 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, then re-enable the subscription. Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on old cluster must be same as latest checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed on the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to value larger than the existing slots on the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..d899986d79 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_logical_replication_slots(ClusterInfo *cluster);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_for_logical_replication_slots(&new_cluster);
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1408,71 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_for_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots transactions because
+ * required WALs will be removed by subsequent pg_resetwal commands. Also,
+ * verify the parameter settings necessary for creating logical replication
+ * slots if required.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *cluster)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots;
+
+	/* logical replication slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+		return;
+
+	conn = connectToServer(cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+								  "FROM pg_catalog.pg_replication_slots "
+								  "WHERE slot_type = 'logical' AND "
+								  "temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slot, but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	nslots = count_logical_slots(&old_cluster);
+
+	/*
+	 * Do additional checks when the logical replication slots have on the old
+	 * cluster.
+	 */
+	if (nslots)
+	{
+		int			max_replication_slots;
+		char	   *wal_level;
+
+		res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+		max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+		if (nslots > max_replication_slots)
+			pg_fatal("max_replication_slots must be greater than or equal to existing logical "
+					 "replication slots on old cluster.");
+
+		PQclear(res);
+
+		res = executeQueryOrDie(conn, "SHOW wal_level;");
+		wal_level = PQgetvalue(res, 0, 0);
+
+		if (strcmp(wal_level, "logical") != 0)
+			pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+					wal_level);
+
+		PQclear(res);
+	}
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..baf51d2eb1 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +71,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions or referred
+		 * to by logical replication slots in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..6c2478ccbd 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,105 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+		slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +710,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * doing create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +766,18 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..713c67d7bd 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_logical_slots(&old_cluster))
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,67 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/*
+			 * Constructs query for creating logical replication slots.
+			 *
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..2dac266537 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
+int			count_logical_slots(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..4df7edd594
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+$old_publisher->start;
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+$old_publisher->stop;
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on new cluster is
+#		too low
+
+# Preparations for the subsequent test.
+# 1. Create an unnecessary slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. New cluster wal_level is set correctly
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 51b7951ad8..0071efef1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v23-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v23-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 1e885ef66a8c2bd4e0c4cafb90e39bcc6a3d3542 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v23 3/3] pg_upgrade: Add check function for logical
 replication slots

In order to prevent data loss, pg_upgrade will fail if the old node has slots with
the status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 99 ++++++++++++++++++-
 src/bin/pg_upgrade/controldata.c              | 37 +++++++
 src/bin/pg_upgrade/info.c                     | 13 ++-
 src/bin/pg_upgrade/pg_upgrade.h               |  7 +-
 .../t/003_logical_replication_slots.pl        | 91 +++++++++++++++--
 5 files changed, 234 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index d899986d79..a5956e0464 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -91,7 +92,7 @@ check_and_dump_old_cluster(bool live_check)
 	get_db_and_rel_infos(&old_cluster);
 
 	/* Extract a list of logical replication slots */
-	get_logical_slot_infos(&old_cluster);
+	get_logical_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -1476,3 +1477,99 @@ check_for_logical_replication_slots(ClusterInfo *cluster)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+		return;
+
+	conn = connectToServer(cluster, active_db->db_name);
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+		return;
+
+	conn = connectToServer(cluster, active_db->db_name);
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+	{
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..248d5dbc03 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,43 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Gather latest checkpoint location if the cluster is newer or
+				 * equal to 17. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char *slash = NULL;
+					uint64 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * Upper and lower part of LSN must be read separately
+					 * because it is reported as %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										(upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 6c2478ccbd..d1f5a6a09a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -661,9 +661,10 @@ get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
  * Higher level routine to generate LogicalSlotInfoArr for all databases.
  */
 void
-get_logical_slot_infos(ClusterInfo *cluster)
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
+	int			slot_count = 0;
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -675,6 +676,7 @@ get_logical_slot_infos(ClusterInfo *cluster)
 		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
 
 		get_logical_slot_infos_per_db(cluster, pDbInfo);
+		slot_count += pDbInfo->slot_arr.nslots;
 
 		if (log_opts.verbose)
 		{
@@ -682,6 +684,15 @@ get_logical_slot_infos(ClusterInfo *cluster)
 			print_slot_infos(&pDbInfo->slot_arr);
 		}
 	}
+
+	/* Do additional checks if slots are found */
+	if (slot_count)
+	{
+		check_for_lost_slots(cluster);
+
+		if (!live_check)
+			check_for_confirmed_flush_lsn(cluster);
+	}
 }
 
 /*
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2dac266537..6df948ac73 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,8 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -366,6 +369,8 @@ void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
 void		create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+void		check_for_lost_slots(ClusterInfo *cluster);
+void		check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 
 /* controldata.c */
@@ -417,7 +422,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
-void		get_logical_slot_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster, bool live_check);
 int			count_logical_slots(ClusterInfo *cluster);
 
 /* option.c */
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 4df7edd594..bde801fd2b 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -17,15 +17,16 @@ my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 # Initialize old cluster
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;
 
 # Initialize new cluster
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
 # TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
@@ -67,13 +68,18 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. New cluster wal_level is set correctly
+# 4. New cluster wal_level is set correctly
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # Cause a failure at the start of pg_upgrade because the new cluster has
@@ -98,7 +104,8 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still have unconsumed WAL
+#		records
 
 # Preparations for the subsequent test.
 # 1. Remove an unnecessary slot
@@ -107,10 +114,58 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# Cause a failure at the start of pg_upgrade because the slot still have
+# unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test.
+# 1. Remove the remained slot
+$old_publisher->start;
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
 );
+
+# 2. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 3. Disable the subscription once
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
 # Actual run, successful upgrade is expected
@@ -133,7 +188,23 @@ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#139

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Masahiko Sawada (#133)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Sawada-san,

Thank you for reviewing! New patch set can be available in [1]/messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com.

0001:

Do we need regression tests to make sure that the slot's
confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint
record?

Added. I wondered the location of the test, but put on
test_decoding/t/002_always_persist.pl.

0002:
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
Should this step be done before "8. Stop both servers" as it might
require to disable subscriptions and to drop 'lost' replication slots?

Right, moved.

Why is there no explanation about the slots' confirmed_flush_lsn check
as prerequisites?

Added.

[1]: /messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#140

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#135)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! The patch can be available in [1]/messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com.

Commit Message

1.
This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.

~

1a
/is newly extracted/is fetched/

Fixed.

1b.
/using the pg_create_logical_replication_slots()/executing
pg_create_logical_replication_slots()/

Fixed.

1c.
/clushter/cluster/

Fixed.

2.
Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.

~

2a.
/it must/slot restoration/

You meant to say s/it must/slot restoration must/, right? Fixed.

2b.
/the restriction/this restriction/

======
doc/src/sgml/ref/pgupgrade.sgml
3.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>
/same replication slot/same replication slots/

Fixed.

4.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command>
command to update the
+     connection string, and then re-enable the subscription.
+    </para>

On the rendered page, it looks a bit strange that DISABLE has a link
but COMMENTION does not have a link.

Added.

5.
+    <para>
+     There are some prerequisites for
<application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>

+1 to use all the itemizedlist changes that Amit suggested [1] in his
attachment.

Yeah, I agreed it is nice. Applied.

src/bin/pg_upgrade/check.c

6.
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);

IMO the arg name should not shadow a global with the same name. See
other review comment for this function signature.

OK, fixed.

7.
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos(&old_cluster, live_check);

But 'live_check' is never used?

It is needed for 0003, moved.

8. check_for_logical_replication_slots
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
IMO the arg name should not shadow a global with the same name. If
this is never going to be called with any param other than
&new_cluster then probably it is better not even to pass have that
argument at all. Just refer to the global new_cluster inside the
function.

You can't say that 'check_for_new_tablespace_dir' does it already so
it must be OK -- I think that the existing function has the same issue
and it also ought to be fixed to avoid shadowing!

Fixed.

9. check_for_logical_replication_slots
+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
IMO the code matches the comment better if you say < 1700 instead of <= 1600.

Changed.

src/bin/pg_upgrade/function.c

10. get_loadable_libraries
/*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions or referred
+ * by logical replication slots in this DB.
*/
ress[dbnum] = executeQueryOrDie(conn,
~

/referred by/referred to by/

Fixed.

src/bin/pg_upgrade/info.c

11.
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ get_logical_slot_infos_per_db(cluster, pDbInfo);
+ slot_count += pDbInfo->slot_arr.nslots;
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
+

11a.
Now the variable 'slot_count' is no longer being returned so it seems redundant.

11b.
What is the 'live_check' parameter for? Nobody is using it.

These are needed for 0003, moved.

12. count_logical_slots
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slotnum;
+}
IMO this variable should be called something like 'slot_count'. This
is the same review comment also made in a previous review. (See [2]
comment#12).

Changed.

13. print_slot_infos
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}
It might be nicer to introduce a variable, instead of all those array
dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

Changed.

14.
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ /*
+ * Constructs query for creating logical replication slots.
+ *
+ * XXX: For simplification, pg_create_logical_replication_slot() is
+ * used. Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
+ conn);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
+ conn);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+ PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+ resetPQExpBuffer(query);
+ }
+
+ PQfinish(conn);
+
+ destroyPQExpBuffer(query);
+ }
+
+ end_progress_output();
+ check_ok();

14a
Similar to the previous comment (#13). It might be nicer to introduce
a variable, instead of all those array dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

Changed.

14b.
It was not clear to me why this command is not being built using
executeQueryOrDie directly instead of using the query buffer. Is there
some reason?

I wanted to take care the encoding, that was the reason I used PQExpBuffer
functions, especially appendStringLiteralConn(). IIUC executeQueryOrDie() could
not take care it.

14c.
I think it would be cleaner to have a separate res variable like you
used elsewhere:
res = executeQueryOrDie(...)

instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
in one line

Hmm, there are some use cases for PQclear(executeQueryOrDie(...)) style, e.g.,
set_locale_and_encoding() and set_frozenxids(). I do not think your style is good
if the result of the query is not used. please tell me if you find a case that
res = executeQueryOrDie(...) is used but result is not checked.

src/bin/pg_upgrade/pg_upgrade.

15.
+void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);

I didn't see a reason for that 'live_check' parameter.

It was needed for 0003, moved.

.../pg_upgrade/t/003_logical_replication_slots.pl

16.
IMO this would be much easier to read if there were BIG comments
between the actual TEST parts

For example

# ------------------------------
# TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Successful upgrade
<preparation>
<test>
<cleanup>

Added. 0003 also followed the style.

17.
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ not removed after pg_upgrade failure");

The message is ambiguous

BEFORE
'run of pg_upgrade of old node with wrong wal_level'

SUGGESTION
'run of pg_upgrade where the new node has the wrong wal_level'

Changed.

18.
+# Create an unnecessary slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
+]);
+
+$old_publisher->stop;
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# smaller than existing slots on old node
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
IMO the comment is misleading. It is not an "unnecessary slot", it is
just a 2nd slot. And this is all part of the preparation for the next
test so it should be under the other comment.

For example SUGGESTION changes like this:

# Preparations for the subsequent test.
# 1. Create an unnecessary slot on the old node
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
]);
$old_publisher->stop;
# 2. max_replication_slots is set to smaller than the number of slots
(2) present on the old node
$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
# 3. new node wal_level is set correctly
$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");

Followed the style.

19.
+# Remove an unnecessary slot and consume WAL records
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_drop_replication_slot('test_slot2');
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
NULL)
+]);
+$old_publisher->stop;
+

This comment should say more like:

# Preparations for the subsequent test.

Followed above style.

20.
+# Actual run, pg_upgrade_output.d is removed at the end

This comment should mention that "successful upgrade is expected"
because all the other prerequisites are now satisfied.

The suggestion was added to the comment

21.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new node');

Should there be a matching new_pulisher->stop;?

Not sure it is really needed, but added.
Also, the word "node" was replaced to "cluster" because the later word is used
in the doc.

[1]: /messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#141

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#137)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thank you for reviewing! The patch can be available in [1]/messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com.

+check_for_lost_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;

I think we should build connection after this check, otherwise the connection
may be left open after returning.

Fixed.

2.
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+	/* logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;

Same as above.

Fixed.

3.
+ if
(GET_MAJOR_VERSION(cluster->major_version) >= 17)
+ {

I think you mean 1700 here.

Right, fixed.

4.
+					p = strpbrk(p,
"01234567890ABCDEF");
+
+					/*
+					 * Upper and lower part of LSN must
be read separately
+					 * because it is reported as %X/%X
format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL,
16);
Maybe we'd better add a sanity check after strpbrk like "if (p == NULL ||
strlen(p) <= 1)" to be consistent with other similar code.

Added.

[1]: /messages/by-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#142

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#130)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san,

Here are some review comments for v22-0003.

(FYI, I was already mid-way through this review before you posted new v23*
patches, so I am posting it anyway in case some comments still apply.)

======
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.

1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+   PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore

~~~

3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.

3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore

======
src/bin/pg_upgrade/controldata.c

5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/

~~~

5b.

= 17 should be >= 1700

~~~

6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a
better mirror for LSN_FORMAT_ARGS.

======
src/bin/pg_upgrade/info.c

7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this
function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the
check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the
check_and_dump_old_cluster() function -- see the "Check for various failure
cases" comment -- IMO this new fragment belongs there with the other checks.

======
src/bin/pg_upgrade/pg_upgrade.h

8.
bool date_is_int;
bool float8_pass_by_value;
uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
} ControlData;

I don't think the new field is particularly different from all the others
that it needs a blank line separator.

======
.../t/003_logical_replication_slots.pl

9.
# Initialize old node
my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
$old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;

# Initialize new node
my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
$new_publisher->init(allows_streaming => 'replica');

-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

Are those removal of the old_publisher start/stop changes that actually
should be done in the 0002 patch?

~~~

10.
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding',
false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
NULL);
]);

What is the purpose of the added SELECT? It doesn't seem covered by the
comment.

~~~

11.
# Remove an unnecessary slot and generate WALs. These records would not be
# consumed before doing pg_upgrade, so that the upcoming test would fail.
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot2');
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
]);
$old_publisher->stop;

Minor rewording of comment sentence.

SUGGESTION
Because these WAL records do not get consumed it will cause the upcoming
pg_upgrade test to fail.

~~~

12.
# Cause a failure at the start of pg_upgrade because the slot still have
# unconsumed WAL records

/still have/still has/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#143

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#138)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v23-0001

======
1. GENERAL -- git apply

The patch fails to apply cleanly. There are whitespace warnings.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102:
trailing whitespace.
# SHUTDOWN_CHECKPOINT record.
warning: 1 line adds whitespace errors.

~~~

2. GENERAL -- which patch is the real one and which is the copy?

IMO this patch has become muddled.

Amit recently created a new thread [1]/messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com "persist logical slots to disk
during shutdown checkpoint", which I thought was dedicated to the
discussion/implementation of this 0001 patch. Therefore, I expected any
0001 patch changes to would be made only in that new thread from now on,
(and maybe you would mirror them here in this thread).

But now I see there are v23-0001 patch changes here again. So, now the same
patch is in 2 places and they are different. It is no longer clear to me
which 0001 ("Always persist...") patch is the definitive one, and which one
is the copy.

======
contrib/test_decoding/t/002_always_persist.pl

3.
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persist to disk during a
shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;

/always persist/always persisted/

~~~

4.
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test (id int)");

Maybe it is better to call the table something different instead of the
same name as the cluster. e.g. 'test_tbl' would be better.

~~~

5.
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+

SUGGESTION
# Stop the node to cause a shutdown checkpoint

~~~

6.
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{
+ if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+ {
+ $latest_checkpoint = $1;
+ last;
+ }
+}
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);
+

6a.
/checkPoint/checkpoint/ (2x)

6b.
+die "No checkPoint in control file found\n"

SUGGESTION
"No checkpoint found in control file\n"

------
[1]: /messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
/messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#144

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#143)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Aug 22, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some review comments for v23-0001

======
1. GENERAL -- git apply

The patch fails to apply cleanly. There are whitespace warnings.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102: trailing whitespace.
# SHUTDOWN_CHECKPOINT record.
warning: 1 line adds whitespace errors.

~~~

2. GENERAL -- which patch is the real one and which is the copy?

IMO this patch has become muddled.

Amit recently created a new thread [1] "persist logical slots to disk during shutdown checkpoint", which I thought was dedicated to the discussion/implementation of this 0001 patch.

Right, I feel it would be good to discuss 0001 on the new thread.
Here, we can just include it for the sake of completeness and testing
purposes.

--
With Regards,
Amit Kapila.

#145

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#140)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 21, 2023 at 6:35 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

9. check_for_logical_replication_slots
+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
IMO the code matches the comment better if you say < 1700 instead of <= 1600.
Changed.

I think it is better to be consistent with the existing code. There
are a few other checks in pg_upgrade.c that uses <=, so it is better
to use it in the same way here.

Another minor comment:
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.

I think this is true in general as well and not specific to
pg_upgrade. So, we can avoid adding anything about connection change
here.

--
With Regards,
Amit Kapila.

#146

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#138)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san,

Here are some review comments for patch v23-0002

======
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message
and docs changes before posting (e.g. cut/paste the rendered text into some
tool like MSWord or Grammarly or ChatGPT or whatever tool you like and
cross-check). There are lots of small typos etc but one up-front check
could avoid long cycles of
reviewing/reporting/fixing/re-posting/confirming...

======
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication
slots is
different from other objects.

/ths/this/

======
doc/src/sgml/ref/pgupgrade.sgml

3.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
DISABLE</command></link>.
+     After the upgrade is complete, then re-enable the subscription. Note
that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
CONNECTION</command></link>
+     command must be also executed on subscriber.
+    </para>

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.

3b.
/cluser/cluster/

3c.
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
CONNECTION</command></link>
+     command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER
SUBSCRIPTION ... CONNECTION command must be also executed on the subscriber.

~~~

4.
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref
linkend="view-pg-replication-slots"/>)
+       of all slots on old cluster must be same as latest checkpoint
location.
+      </para>
+     </listitem>

4a.
/on old cluster/on the old cluster/

4b.
/as latest/as the latest/
~~

5.
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must
be
+       installed on the new PostgreSQL executable directory.
+      </para>
+     </listitem>

/installed on/installed in/ ??

6.
+     <listitem>
+      <para>
+       The new cluster must have
+       <link
linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to value larger than the existing slots on the old
cluster.
+      </para>
+     </listitem>

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the
old cluster.

======
src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no
Assert and no checking inside this function to ensure that is the case or
not. It seems strange that the *cluster is passed as an argument but then
the whole function body and messages assume it can only be a new cluster
anyway.

IMO it would be better to rename this function to something like
check_new_cluster_logical_replication_slots() and DO NOT pass any parameter
but just use the global new_cluster within the function body.

~~~

8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.

~~~

9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+  "FROM pg_catalog.pg_replication_slots "
+  "WHERE slot_type = 'logical' AND "
+  "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found
\"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/

10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old
cluster.

~~~

11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing
logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of
logical replication slots (%d) on the old cluster.

11b.
I think it would be helpful for the current values to be displayed in the
fatal message so the user will know more about what value to set. Notice
that my above suggestion has some substitution markers.

======
src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+   slot_info->slotname,
+   slot_info->plugin,
+   slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.

======
.../pg_upgrade/t/003_logical_replication_slots.pl

13.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not
'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding',
false, true);"
+);
+$old_publisher->stop;

13a.
It would be nicer if all the test parts have identical formats. So here it
should also say

# Preparations for the subsequent test:
# 1. Create a slot on the old cluster

13b.
Notice the colon (:) at the end of that comment "Preparations for the
subsequent test:". All the other preparation comments in this file should
also have a colon.

14.
+# Cause a failure at the start of pg_upgrade because wal_level is replica

SUGGESTION
# pg_upgrade will fail because the new cluster wal_level is 'replica'

~~~

15.
+# 1. Create an unnecessary slot on the old cluster

(but it is not unnecessary -- it is necessary for this test!)

SUGGESTION
+# 1. Create a second slot on the old cluster

~~~

16.
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots

SUGGESTION
# pg_upgrade will fail because the new cluster has insufficient
max_replication_slots

~~~

17.
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot

SUGGESTION
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the
old cluster, so the new cluster config max_replication_slots=1 will now be
enough.

~~~

18.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();

Maybe should be some added comments like:
# Check that the slot 'test_slot1' has migrated to the new cluster.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#147

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#138)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 21, 2023 at 6:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

2.
+ /*
+ * Checking for logical slots must be done before
+ * check_new_cluster_is_empty() because the slot_arr attribute of the
+ * new_cluster will be checked in that function.
+ */
+ if (count_logical_slots(&old_cluster))
+ {
+ get_logical_slot_infos(&new_cluster, false);
+ check_for_logical_replication_slots(&new_cluster);
+ }
+
check_new_cluster_is_empty();
Can't we simplify this checking by simply querying
pg_replication_slots for any usable slot something similar to what we
are doing in check_for_prepared_transactions()? We can add this check
in the function check_for_logical_replication_slots().
Some checks were included to check_for_logical_replication_slots(), and
get_logical_slot_infos() for new_cluster was removed as you said.

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+   "FROM pg_catalog.pg_replication_slots "
+   "WHERE slot_type = 'logical' AND "
+   "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
+
+ PQclear(res);
+
+ nslots = count_logical_slots(&old_cluster);
+
+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

Shouldn't these checks be reversed? I mean it would be better to test
the presence of slots on the new cluster if there is any slot present
on the old cluster.

--
With Regards,
Amit Kapila.

#148

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#138)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

I already posted a review for v22-0003 earlier today, but v23-0003 was
already posted so those are not yet addressed.

Here are a few more review comments I noticed when looking at the latest
v23-0003.

======
src/bin/pg_upgrade/check.c

1.
+#include "access/xlogdefs.h"
#include "catalog/pg_authid_d.h"

Was this #include needed here? I noticed you've already included the same
in the "pg_upgrade.h".

~~~

2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit
strange. Didn't we already iterate every single slot in all DBs in the
earlier function get_logical_slot_infos_per_db()? There we were only
looking for wal_status <> 'lost', but we could have got *every* wal_status
and also detected these 'lost' ones at the same time up-front, instead of
having this extra function with more SQL to do pretty much the same SELECT.

Perhaps coding the current way there is a clear separation of the fetching
code and the checking code, and that might be the best approach, but it
somehow seems a shame/waste to be executing almost the same slots data with
the same SQL 2x, so I wondered if there is a better way to arrange this.

======
src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But
AFAICT they are not guarded here so they will be executed by both sides.
Previously (in my review of v22-0003) I suggested these calls maybe
belonged in the calling function check_and_dump_old_cluster(). I think that.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#149

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#142)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for giving comments! New version will be available
in the upcoming post.

1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.

Seems right, but I revisit check_and_dump_old_cluster() again and found that
some version-specific checks are done outside the checking function.
So I started to follow the style and then the part is moved to
check_and_dump_old_cluster(). Also, version checking for new cluster is also
moved to check_new_cluster(). Is it OK for you?

1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.

Per suggestion from Amit, I used < 1700. Some other changes in 0002 were reverted.

2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+   PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore

Fixed.

3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.

Per reply for comment 1, the part was no longer needed.

3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.

Per suggestion from Amit, I used < 1700.

4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore

Fixed.

5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/

Fixed.

5b.

= 17 should be >= 1700

Per suggestion from Amit, I used < 1700.

6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a better mirror for LSN_FORMAT_ARGS.

Changed the definition to uint32, and a cast was added.

7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the check_and_dump_old_cluster() function -- see the "Check for various failure cases" comment -- IMO this new fragment belongs there with the other checks.

All the checks were moved to check_and_dump_old_cluster(), and adds a check for its major version.

8.
bool date_is_int;
bool float8_pass_by_value;
uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
} ControlData;

I don't think the new field is particularly different from all the others that it needs a blank line separator.

I removed the blank. Actually I wondered where the attribute should be, but kept at last.

9.
# Initialize old node
my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
$old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;

# Initialize new node
my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
$new_publisher->init(allows_streaming => 'replica');

-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

Are those removal of the old_publisher start/stop changes that actually should be done in the 0002 patch?

Yes, It should be removed from 0002.

10.
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);
]);

What is the purpose of the added SELECT? It doesn't seem covered by the comment.

The SELECT statement is needed to trigger the failure caused by the insufficient
max_replication_slots. Checking on new cluster is started after old servers are
verified, so if the step is omitted, another error is reported:

```
Checking confirmed_flush_lsn for logical replication slots
WARNING: logical replication slot "test_slot1" has not consumed WALs yet

One or more logical replication slots still have unconsumed WAL records.
```

I added a comment about it.

Minor rewording of comment sentence.

SUGGESTION
Because these WAL records do not get consumed it will cause the upcoming pg_upgrade test to fail.

Added.

12.
# Cause a failure at the start of pg_upgrade because the slot still have
# unconsumed WAL records

/still have/still has/

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#150

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#143)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Here are some review comments for v23-0001

Thanks for the comment! But I did not update 0001 patch in this thread.
It will be managed in the forked one...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#151

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#145)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for the comment! Next version will be available in upcoming post.

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;
IMO the code matches the comment better if you say < 1700 instead of <=
1600.

Changed.

I think it is better to be consistent with the existing code. There
are a few other checks in pg_upgrade.c that uses <=, so it is better
to use it in the same way here.

OK, reverted.

Another minor comment:
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.
I think this is true in general as well and not specific to
pg_upgrade. So, we can avoid adding anything about connection change
here.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#152

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#147)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for giving comment. New version will be available in the upcoming post.

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+   "FROM pg_catalog.pg_replication_slots "
+   "WHERE slot_type = 'logical' AND "
+   "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
+
+ PQclear(res);
+
+ nslots = count_logical_slots(&old_cluster);
+
+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

Shouldn't these checks be reversed? I mean it would be better to test
the presence of slots on the new cluster if there is any slot present
on the old cluster.

Hmm, I think the later part is meaningful only when the old cluster has logical
slots. To sum up, any checking should be done when the
count_logical_slots(&old_cluster) > 0, right? Fixed like that.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#153

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#146)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for giving comments! PSA the new version.

======
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message and docs changes before posting (e.g. cut/paste the rendered text into some tool like MSWord or Grammarly or ChatGPT or whatever tool you like and cross-check). There are lots of small typos etc but one up-front check could avoid long cycles of reviewing/reporting/fixing/re-posting/confirming...

I checked all of sentences for Grammarly. Sorry for poor English.

======
Commit message

/ths/this/

Fixed.

doc/src/sgml/ref/pgupgrade.sgml

3.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, then re-enable the subscription. Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.
+    </para>

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.

Fixed.

3b.
/cluser/cluster/

3c.
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER SUBSCRIPTION ... CONNECTION command must be also executed on the subscriber.

The part was removed.

4.
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on old cluster must be same as latest checkpoint location.
+      </para>
+     </listitem>

4a.
/on old cluster/on the old cluster/

Fixed.

4b.
/as latest/as the latest/

Fixed.

5.
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed on the new PostgreSQL executable directory.
+      </para>
+     </listitem>

/installed on/installed in/ ??

"installed in" is better, fixed.

6.
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to value larger than the existing slots on the old cluster.
+      </para>
+     </listitem>

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the old cluster.

Fixed.

src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no Assert and no checking inside this function to ensure that is the case or not. It seems strange that the *cluster is passed as an argument but then the whole function body and messages assume it can only be a new cluster anyway.

IMO it would be better to rename this function to something like check_new_cluster_logical_replication_slots() and DO NOT pass any parameter but just use the global new_cluster within the function body.

Hmm, I followed other functions, e.g., check_for_composite_data_type_usage() is
called only for old one but it has an argument *cluster. What is the difference
between them? Moreover, how about check_for_lost_slots() and
check_for_confirmed_flush_lsn()? Fixed for the moment.

8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.

The part was removed.

9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+  "FROM pg_catalog.pg_replication_slots "
+  "WHERE slot_type = 'logical' AND "
+  "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found \"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/

Fixed.

10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old cluster.

Per suggestion from Amit, the part was removed.

11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of logical replication slots (%d) on the old cluster.

11b.
I think it would be helpful for the current values to be displayed in the fatal message so the user will know more about what value to set. Notice that my above suggestion has some substitution markers.

Changed.

src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+   slot_info->slotname,
+   slot_info->plugin,
+   slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.

Added.

.../pg_upgrade/t/http://003_logical_replication_slots.pl

13.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;

13a.
It would be nicer if all the test parts have identical formats. So here it should also say

# Preparations for the subsequent test:
# 1. Create a slot on the old cluster

I did not use because there was only one step, but followed the style.

13b.
Notice the colon (:) at the end of that comment "Preparations for the subsequent test:". All the other preparation comments in this file should also have a colon.

Added.

14.
+# Cause a failure at the start of pg_upgrade because wal_level is replica

SUGGESTION
# pg_upgrade will fail because the new cluster wal_level is 'replica'

Fixed.

15.
+# 1. Create an unnecessary slot on the old cluster

(but it is not unnecessary -- it is necessary for this test!)

SUGGESTION
+# 1. Create a second slot on the old cluster

Fixed.

16.
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots

SUGGESTION
# pg_upgrade will fail because the new cluster has insufficient max_replication_slots

Fixed.

17.
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot

SUGGESTION
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old cluster, so the new cluster config max_replication_slots=1 will now be enough.

Fixed.

18.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();

Maybe should be some added comments like:
# Check that the slot 'test_slot1' has migrated to the new cluster.

Added.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v24-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchapplication/octet-stream; name=v24-0001-Always-persist-to-disk-logical-slots-during-a-sh.patchDownload

From 11631560eb9591044c27dc68507cc111a306d12d Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v24 1/3] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 contrib/test_decoding/meson.build             |  1 +
 contrib/test_decoding/t/002_always_persist.pl | 74 +++++++++++++++++++
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/replication/slot.c                | 25 ++++---
 src/include/replication/slot.h                |  2 +-
 5 files changed, 92 insertions(+), 12 deletions(-)
 create mode 100644 contrib/test_decoding/t/002_always_persist.pl

diff --git a/contrib/test_decoding/meson.build b/contrib/test_decoding/meson.build
index 7b05cc25a3..12afb9ea8c 100644
--- a/contrib/test_decoding/meson.build
+++ b/contrib/test_decoding/meson.build
@@ -72,6 +72,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_repl_stats.pl',
+      't/002_always_persist.pl',
     ],
   },
 }
diff --git a/contrib/test_decoding/t/002_always_persist.pl b/contrib/test_decoding/t/002_always_persist.pl
new file mode 100644
index 0000000000..cf78953eef
--- /dev/null
+++ b/contrib/test_decoding/t/002_always_persist.pl
@@ -0,0 +1,74 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persist to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test (id int)");
+
+# Create replication slot
+$node->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');"
+);
+
+# Insert some data
+$node->safe_psql('postgres',
+	"INSERT INTO test VALUES (generate_series(1, 5));");
+
+# Consume WAL records
+$node->safe_psql('postgres',
+    "SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);"
+);
+
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{
+	if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+	{
+		$latest_checkpoint = $1;
+		last;
+	}
+}
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);
+
+# Boot the node again and check confirmed_flush_lsn. If the slot has persisted,
+# the LSN becomes same as the latest checkpoint location, which means the
+# SHUTDOWN_CHECKPOINT record. 
+$node->start();
+my $confirmed_flush = $node->safe_psql('postgres',
+	"SELECT confirmed_flush_lsn FROM pg_replication_slots;"
+);
+
+# Compare confirmed_flush_lsn and checkPoint
+ok($confirmed_flush eq $latest_checkpoint,
+	"Check confirmed_flush is same as latest checkpoint location");
+
+# Shutdown
+$node->stop;
+
+done_testing();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..4d1e2d193e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -783,7 +784,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1565,11 +1566,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1594,7 +1594,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1700,7 +1700,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1726,7 +1726,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1740,8 +1741,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * Don't do anything if there's nothing to write, unless this is called for
+	 * a logical slot during a shutdown checkpoint, as we want to persist the
+	 * confirmed_flush LSN in that case, even if that's the only modification.
+	 */
+	if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..7ca37c9f70 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -241,7 +241,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
-- 
2.27.0

v24-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v24-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 99b89173587a8ce35545bbd02e9eecebf029fd13 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v24 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  62 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  65 ++++++++
 src/bin/pg_upgrade/function.c                 |  18 ++-
 src/bin/pg_upgrade/info.c                     | 132 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  78 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 516 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..95d1d3ced8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,68 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..61a81c5011 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_logical_slot_infos(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,10 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	/* Logical replication slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+		check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1410,60 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = count_logical_slots(&old_cluster);
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Quick exit if there are no logical slots on the old cluster */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+								  "FROM pg_catalog.pg_replication_slots "
+								  "WHERE slot_type = 'logical' AND "
+								  "temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slots but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..2813d2ff20 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +71,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions, or referred
+		 * to by logical replication slots in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..59ccc01b57 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,113 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return;
+
+	if (cluster == &old_cluster)
+		pg_log(PG_VERBOSE, "\nsource databases:");
+	else
+		pg_log(PG_VERBOSE, "\ntarget databases:");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	/* Quick exit if the version is prior to PG17. */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+		return 0;
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+		slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +718,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +774,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..f3d3991bef 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_logical_slots(&old_cluster))
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,67 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/*
+			 * Constructs a query for creating logical replication slots.
+			 *
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..2dac266537 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(ClusterInfo *cluster);
+int			count_logical_slots(ClusterInfo *cluster);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..ae87c33708
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'test_slot1' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 51b7951ad8..0071efef1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v24-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v24-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 49c17070ce789f09105ef53f47626cc9c2c1be14 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v24 3/3] pg_upgrade: Add check function for logical
 replication slots

To prevent data loss, pg_upgrade will fail if the old node has slots with the
status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 109 ++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              |  37 ++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   3 +
 .../t/003_logical_replication_slots.pl        |  88 ++++++++++++--
 4 files changed, 230 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 61a81c5011..e9cdbcbebf 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -31,6 +32,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
+static void check_for_confirmed_flush_lsn(ClusterInfo *cluster);
 
 
 /*
@@ -108,6 +110,21 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/* Logical replication slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		check_for_lost_slots(&old_cluster);
+
+		/*
+		 * Do additional checks if a live check is not required. This requires
+		 * that confirmed_flush_lsn of all the slots is the same as the latest
+		 * checkpoint location, but it would be satisfied only when the server
+		 * has been shut down.
+		 */
+		if (!live_check)
+			check_for_confirmed_flush_lsn(&old_cluster);
+	}
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -1467,3 +1484,95 @@ check_new_cluster_logical_replication_slots(void)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_logical_slots(cluster) == 0)
+		return;
+
+	conn = connectToServer(cluster, active_db->db_name);
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &cluster->dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_logical_slots(cluster) == 0)
+		return;
+
+	conn = connectToServer(cluster, active_db->db_name);
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..51eb0df99c 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,43 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Gather the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char *slash = NULL;
+					uint32 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is reported in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2dac266537..9e0069a31d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -366,6 +368,7 @@ void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
 void		create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+void		check_for_lost_slots(ClusterInfo *cluster);
 
 
 /* controldata.c */
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index ae87c33708..812ed55adf 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -22,6 +22,10 @@ $old_publisher->init(allows_streaming => 'logical');
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
@@ -65,13 +69,19 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. wal_level is set correctly on the new cluster
+# 4. wal_level is set correctly on the new cluster
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # pg_upgrade will fail because the new cluster has insufficient max_replication_slots
@@ -95,7 +105,7 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
 
 # Preparations for the subsequent test:
 # 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
@@ -106,12 +116,60 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
 );
 $old_publisher->stop;
 
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# 2. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 3. Disable the subscription once
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
@@ -133,7 +191,23 @@ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#154

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#148)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for giving comments! New version can be available in [1]/messages/by-id/TYAPR01MB5866DD3348B5224E0A1BFC3EF51CA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

1.
+#include "access/xlogdefs.h"
#include "catalog/pg_authid_d.h"

Was this #include needed here? I noticed you've already included the same in the "pg_upgrade.h".

It was needed because the macro LSN_FORMAT_ARGS() was used in the file.
I preferred all the needed file are included even if it has already been done in header, so
#include was written here.

2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit strange. Didn't we already iterate every single slot in all DBs in the earlier function get_logical_slot_infos_per_db()? There we were only looking for wal_status <> 'lost', but we could have got *every* wal_status and also detected these 'lost' ones at the same time up-front, instead of having this extra function with more SQL to do pretty much the same SELECT.

Perhaps coding the current way there is a clear separation of the fetching code and the checking code, and that might be the best approach, but it somehow seems a shame/waste to be executing almost the same slots data with the same SQL 2x, so I wondered if there is a better way to arrange this.

Hmm, but you did not like to do additional checks in the get_logical_slot_infos(),
right? They cannot go together. In case of check_new_cluster(), information for
relations is extracted in get_db_and_rel_infos() and then checked whether it is
empty or not in check_new_cluster_is_empty(). The phase is also separated.

src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But AFAICT they are not guarded here so they will be executed by both sides. Previously (in my review of v22-0003) I suggested these calls maybe belonged in the calling function check_and_dump_old_cluster(). I think that.

Moved to check_and_dump_old_cluster().

[1]: /messages/by-id/TYAPR01MB5866DD3348B5224E0A1BFC3EF51CA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#155

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#153)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Thanks for the updated patches.

Here are some review comments for the patch v24-0002

======
doc/src/sgml/ref/pgupgrade.sgml

1.
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no
slots
+       whose <structfield>wal_status</structfield> is
<literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref
linkend="view-pg-replication-slots"/>)
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>

It might be more tidy to change the way those links (e.g. "See section
54.19") are presented:

1a.
SUGGESTION
All slots on the old cluster must be usable, i.e., there are no slots whose
<link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
is <literal>lost</literal>.

1b.
SUGGESTION
<link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
of all slots on the old cluster must be the same as the latest checkpoint
location.

======
src/bin/pg_upgrade/check.c

2.
+ /* Logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+ check_new_cluster_logical_replication_slots();
+

Does it even make sense to check the new_cluster version? IIUC pg_upgrade
*always* updates to the current PG version, which must be 1700 by
definition, because this only is a PG17 patch, right?

For example, see check_cluster_versions() function where it does this check:

/* Only current PG version is supported as a target */
if (GET_MAJOR_VERSION(new_cluster.major_version) !=
GET_MAJOR_VERSION(PG_VERSION_NUM))
pg_fatal("This utility can only upgrade to PostgreSQL version %s.",
PG_MAJORVERSION);

======
src/bin/pg_upgrade/function.c

3.
os_info.libraries = (LibraryInfo *) pg_malloc(totaltups *
sizeof(LibraryInfo));
totaltups = 0;

for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
PGresult *res = ress[dbnum];
int ntups;
int rowno;

ntups = PQntuples(res);
for (rowno = 0; rowno < ntups; rowno++)
{
char *lib = PQgetvalue(res, rowno, 0);

os_info.libraries[totaltups].name = pg_strdup(lib);
os_info.libraries[totaltups].dbnum = dbnum;

totaltups++;
}
PQclear(res);
}

Although this was not introduced by your patch, I do not understand why the
'totaltups' variable gets reset to zero and then re-incremented in these
loops.

In other words, how is it possible for the end result of 'totaltups' to be
any different from what was already calculated earlier in this function?

IMO totaltups = 0; and totaltups++; is just redundant code.

======
src/bin/pg_upgrade/info.c

4. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the
new_cluster >= 1700, because this patch is for PG17 by definition.

OTOH, I also don't recognise the reason why there has to be a PG17
restriction on the 'old_cluster' version. Such a restriction seems to
cripple the usefulness of this patch (eg. cannot even upgrade slots from
PG16 to PG17), and there is no explanation given for it. If there is some
valid incompatibility reason why only PG17 old_cluster slots can be
upgraded then it ought to be described in detail and probably also
mentioned in the PG DOCS.

~~~

5. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all
databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

Same as the previous comment #4. I had doubts about the intent/need for
this cluster version checking.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#156

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#155)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 24, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

======
src/bin/pg_upgrade/info.c

4. get_logical_slot_infos
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;
It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the new_cluster >= 1700, because this patch is for PG17 by definition.

OTOH, I also don't recognise the reason why there has to be a PG17 restriction on the 'old_cluster' version. Such a restriction seems to cripple the usefulness of this patch (eg. cannot even upgrade slots from PG16 to PG17), and there is no explanation given for it. If there is some valid incompatibility reason why only PG17 old_cluster slots can be upgraded then it ought to be described in detail and probably also mentioned in the PG DOCS.

One of the main reasons is that slots prior to v17 won't persist
confirm_flush_lsn as discussed in the email thread [1]/messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com which means it
will always fail even if we allow to upgrade from versions prior to
v17. Now, there is an argument that let's backpatch what's being
discussed in [1]/messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com and then we will be able to upgrade slots from the
prior version. Normally, we don't backatch new enhancements, so even
if we want to do that in this case, a separate argument has to be made
for it. We have already discussed this point in this thread. We can
probably add a comment in the patch where we do version checks so that
it will be a bit easier to understand the reason.

[1]: /messages/by-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#157

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#153)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san

FYI, the v24-0003 tests for pg_upgrade did not work for me:

~~~

# +++ tap check in src/bin/pg_upgrade +++

t/001_basic.pl ...................... ok

t/002_pg_upgrade.pl ................. ok

t/003_logical_replication_slots.pl .. 7/?

# Failed test 'run of pg_upgrade of old cluster'

# at t/003_logical_replication_slots.pl line 174.

# Failed test 'pg_upgrade_output.d/ removed after pg_upgrade success'

# at t/003_logical_replication_slots.pl line 187.

# Failed test 'check the slot exists on new cluster'

# at t/003_logical_replication_slots.pl line 194.

# got: ''

# expected: 'sub|t'

# Tests were run but no plan was declared and done_testing() was not seen.

t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
7424, 0x1d00)

Failed 3/9 subtests

Test Summary Report

-------------------

t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 9 Failed: 3)

Failed tests: 7-9

Non-zero exit status: 29

Parse errors: No plan found in TAP output

Files=3, Tests=35, 116 wallclock secs ( 0.06 usr 0.01 sys + 18.02
cusr 6.40 csys = 24.49 CPU)

Result: FAIL

make: *** [check] Error 1

~~~

I can provide the log files with more details about the errors if you
cannot reproduce this

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#158

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#153)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Notwithstanding the test errors I am getting for v24-0003, here are
some code review comments for this patch anyway.

======
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)

1a.
AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

1b.
Can't this be a static function now?

2.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+    PQgetvalue(res, i, i_slotname));

Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

~~~

3. check_for_confirmed_flush_lsn

+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)

AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

4.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));

Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

======
src/bin/pg_upgrade/controldata.c

5. get_control_data

+ else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+ {
+ /*
+ * Gather the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)

But we are not "gathering" anything. It's just one LSN. I think this
ought to just say "Read the latest..."

6.
+ /*
+ * The upper and lower part of LSN must be read separately
+ * because it is reported in %X/%X format.
+ */

/reported/stored as/

======
src/bin/pg_upgrade/pg_upgrade.h

7.
+void check_for_lost_slots(ClusterInfo *cluster);\

Why is this needed here? Can't this be a static function?

======
.../t/003_logical_replication_slots.pl

8.
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+# tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

I wondered if that step really needed. Why will there be WAL records to consume?

IIUC we haven't published anything yet.

~~~

9.
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);

Should removal of the slot be done as part of the cleanup of the
previous test, instead of preparing for this one?

~~~

10.
# 3. Disable the subscription once
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
$old_publisher->stop;

10a.
What do you mean by "once"?

10b.
That old_publisher->stop; seems strangely placed. Why is it here?

~~~

11.
# Check that the slot 'test_slot1' has migrated to the new cluster
$new_publisher->start;
my $result = $new_publisher->safe_psql('postgres',
"SELECT slot_name, two_phase FROM pg_replication_slots");
is($result, qq(sub|t), 'check the slot exists on new cluster');

That comment now seems wrong. That slot was previously removed, right?

~~~

12.
# Update the connection
my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
$subscriber->safe_psql('postgres',
"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");

Maybe better to combine both SQL.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#159

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#157)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

FYI, the v24-0003 tests for pg_upgrade did not work for me:

Hmm, I ran tests more than 1hr but could not reproduce the failure.
cfbot also said OK multiple times...

Could you please check source codes again and send log files
if it is still problem?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#160

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#155)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! PSA new version patch set.
Note again that 0001 patch was replaced to new one[1]/messages/by-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg@mail.gmail.com, but you do not have to
discuss that - it should be done in forked thread.

1.
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>

It might be more tidy to change the way those links (e.g. "See section 54.19") are presented:

1a.
SUGGESTION
All slots on the old cluster must be usable, i.e., there are no slots whose <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield> is <literal>lost</literal>.

Fixed.

1b.
SUGGESTION
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield> of all slots on the old cluster must be the same as the latest checkpoint location.

Fixed.

2.
+ /* Logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+ check_new_cluster_logical_replication_slots();
+

Does it even make sense to check the new_cluster version? IIUC pg_upgrade *always* updates to the current PG version, which must be 1700 by definition, because this only is a PG17 patch, right?

For example, see check_cluster_versions() function where it does this check:

/* Only current PG version is supported as a target */
if (GET_MAJOR_VERSION(new_cluster.major_version) != GET_MAJOR_VERSION(PG_VERSION_NUM))
pg_fatal("This utility can only upgrade to PostgreSQL version %s.",
PG_MAJORVERSION);

You are right, the new_cluster always has the same version as pg_upgrade.
Removed.

os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
totaltups = 0;

for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
PGresult *res = ress[dbnum];
int ntups;
int rowno;

ntups = PQntuples(res);
for (rowno = 0; rowno < ntups; rowno++)
{
char *lib = PQgetvalue(res, rowno, 0);

os_info.libraries[totaltups].name = pg_strdup(lib);
os_info.libraries[totaltups].dbnum = dbnum;

totaltups++;
}
PQclear(res);
}

Although this was not introduced by your patch, I do not understand why the 'totaltups' variable gets reset to zero and then re-incremented in these loops.

In other words, how is it possible for the end result of 'totaltups' to be any different from what was already calculated earlier in this function?

IMO totaltups = 0; and totaltups++; is just redundant code.

First of all, I will not fix that in this thread, it should be done in another
place. I do not want to expand the thread anymore. Personally, it seemed that
totaltups was just reused as index for the array.

4. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the new_cluster >= 1700, because this patch is for PG17 by definition.

OTOH, I also don't recognise the reason why there has to be a PG17 restriction on the 'old_cluster' version. Such a restriction seems to cripple the usefulness of this patch (eg. cannot even upgrade slots from PG16 to PG17), and there is no explanation given for it. If there is some valid incompatibility reason why only PG17 old_cluster slots can be upgraded then it ought to be described in detail and probably also mentioned in the PG DOCS.

Upgrading logical slots with verifications requires that they surely saved to
disk while shutting down (0001 patch). Currently we do not have a plan to
backpatch it, so I think the checking must be needed. Instead, I added
descriptions in the doc and code comments.

5. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

Same as the previous comment #4. I had doubts about the intent/need for this cluster version checking.

As I said above, this is needed.

[1]: /messages/by-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v25-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchapplication/octet-stream; name=v25-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchDownload

From 6bdbdc840bf3247812cb178b2d58c1667e3d8894 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v25 1/3] Persist to disk logical slots during a shutdown
 checkpoint if the updated confirmed_flush_lsn has not yet been persisted.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  33 ++++--
 src/include/replication/slot.h                |   5 +-
 src/test/subscription/meson.build             |   1 +
 src/test/subscription/t/034_always_persist.pl | 106 ++++++++++++++++++
 5 files changed, 135 insertions(+), 12 deletions(-)
 create mode 100644 src/test/subscription/t/034_always_persist.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..1c6db2a99a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_persisted_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,7 +1734,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1747,8 +1749,16 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * Don't do anything if there's nothing to write, unless this is called for
+	 * a logical slot during a shutdown checkpoint and if the updated
+	 * confirmed_flush LSN has not yet been persisted, as we want to persist
+	 * the updated confirmed_flush LSN in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty &&
+		!(SlotIsLogical(slot) && is_shutdown &&
+		  (slot->data.confirmed_flush != slot->last_persisted_confirmed_flush)))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1878,6 +1888,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+
+	slot->last_persisted_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2086,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_persisted_confirmed_flush =  cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..b519f7af5f 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,9 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/* The last persisted confirmed flush lsn */
+	XLogRecPtr	last_persisted_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +244,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index bd673a9d68..cdd2f8ba47 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -40,6 +40,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_always_persist.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_always_persist.pl b/src/test/subscription/t/034_always_persist.pl
new file mode 100644
index 0000000000..9973476fff
--- /dev/null
+++ b/src/test/subscription/t/034_always_persist.pl
@@ -0,0 +1,106 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file itself
+	my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file found\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok($latest_checkpoint eq $confirmed_flush_from_log,
+		"Check the decoding starts from the confirmed_flush which is the same as the latest_checkpoint");
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create table
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));"
+);
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*) FROM test_tbl"
+);
+
+is($result, qq(5), "check initial copy was done");
+
+# Set wal_receiver_status_interval to zero to suppress keepalive messages
+# between nodes.
+$node_subscriber->append_conf('postgresql.conf', q{
+wal_receiver_status_interval = 0
+});
+$node_subscriber->reload();
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart publisher once. If the slot has persisted, the confirmed_flush_lsn
+# becomes the same as the latest checkpoint location, which means the
+# SHUTDOWN_CHECKPOINT record.
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset
+);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+	or die "could not get confirmed_flush_lsn";
+
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v25-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v25-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 5a23b37717c6374ea9ca22f91ee5dd5d49821151 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v25 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  65 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  63 ++++++++
 src/bin/pg_upgrade/function.c                 |  18 ++-
 src/bin/pg_upgrade/info.c                     | 135 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  78 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 520 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..eb34233fb4 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Currently,
+     <application>pg_upgrade</application> supports migrate logical replication
+     slots when the old cluster is 17.X and later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..3445fe6e13 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_logical_slot_infos();
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1408,60 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = count_logical_slots();
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Quick exit if there are no logical slots on the old cluster */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+								  "FROM pg_catalog.pg_replication_slots "
+								  "WHERE slot_type = 'logical' AND "
+								  "temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slots but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..2813d2ff20 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +71,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions, or referred
+		 * to by logical replication slots in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..7794d79086 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,116 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_logical_slot_infos_per_db(ClusterInfo *cluster, DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_logical_slot_infos(void)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	pg_log(PG_VERBOSE, "\nsource databases:");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+		get_logical_slot_infos_per_db(&old_cluster, pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	/* Quick exit if the version is prior to PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +721,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +777,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..945c3dc57c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,19 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Create logical replication slots.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +874,67 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/*
+			 * Constructs a query for creating logical replication slots.
+			 *
+			 * XXX: For simplification, pg_create_logical_replication_slot() is
+			 * used. Is it sufficient?
+			 */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..e37a671f8c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_logical_slot_infos(void);
+int			count_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..ae87c33708
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'test_slot1' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v25-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v25-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From b2149c29249d7798d44ee096ac0e3b0ac7c2c940 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v25 3/3] pg_upgrade: Add check function for logical
 replication slots

To prevent data loss, pg_upgrade will fail if the old node has slots with the
status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 113 ++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              |  37 ++++++
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 .../t/003_logical_replication_slots.pl        |  91 ++++++++++++--
 4 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 3445fe6e13..ebf8a5d290 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -31,6 +32,8 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
+static void check_for_confirmed_flush_lsn(void);
+static void check_for_lost_slots(void);
 
 
 /*
@@ -108,6 +111,24 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		check_for_lost_slots();
+
+		/*
+		 * Do additional checks if a live check is not required. This requires
+		 * that confirmed_flush_lsn of all the slots is the same as the latest
+		 * checkpoint location, but it would be satisfied only when the server
+		 * has been shut down.
+		 */
+		if (!live_check)
+			check_for_confirmed_flush_lsn();
+	}
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -1465,3 +1486,95 @@ check_new_cluster_logical_replication_slots(void)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..ad9d0c2702 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,43 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char *slash = NULL;
+					uint32 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index e37a671f8c..654ac228da 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index ae87c33708..e277076075 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -22,6 +22,10 @@ $old_publisher->init(allows_streaming => 'logical');
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
@@ -65,13 +69,19 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. wal_level is set correctly on the new cluster
+# 4. wal_level is set correctly on the new cluster
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # pg_upgrade will fail because the new cluster has insufficient max_replication_slots
@@ -95,7 +105,7 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
 
 # Preparations for the subsequent test:
 # 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
@@ -106,10 +116,57 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
 );
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
 # Actual run, successful upgrade is expected
@@ -129,11 +186,29 @@ command_ok(
 ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-# Check that the slot 'test_slot1' has migrated to the new cluster
+# Check that the slot 'sub' has migrated to the new cluster
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#161

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#158)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! New patch could be available in [1]/messages/by-id/TYAPR01MB5866D7677BAE6F66839570FCF5E3A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

1. check_for_lost_slots
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)
1a.
AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

Actually I have asked in previous post, and I understood you like the style.
Fixed. Also, get_logical_slot_infos() and count_logical_slots() are also called
only for old_cluster, then removed the argument.

1b.
Can't this be a static function now?

Yeah, changed to static.

2.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+    PQgetvalue(res, i, i_slotname));
Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

create_script_for_old_cluster_deletion() has the word and I followed that:

```
pg_log(PG_WARNING,
"\nWARNING: new data directory should not be inside the old data directory, i.e. %s", old_cluster_pgdata);
```

3. check_for_confirmed_flush_lsn
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

Removed.

4.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

See above reply, create_script_for_old_cluster_deletion() has that.

src/bin/pg_upgrade/controldata.c

5. get_control_data
+ else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+ {
+ /*
+ * Gather the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
But we are not "gathering" anything. It's just one LSN. I think this
ought to just say "Read the latest..."

Changed.

6.
+ /*
+ * The upper and lower part of LSN must be read separately
+ * because it is reported in %X/%X format.
+ */

/reported/stored as/

Changed.

src/bin/pg_upgrade/pg_upgrade.h

7.
+void check_for_lost_slots(ClusterInfo *cluster);\

Why is this needed here? Can't this be a static function?

Removed.

.../t/003_logical_replication_slots.pl
8.
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+# tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
NULL);"
+);
I wondered if that step really needed. Why will there be WAL records to consume?

IIUC we haven't published anything yet.

The primal reason was described in [2]/messages/by-id/TYAPR01MB58668021BB233D129B466122F51CA@TYAPR01MB5866.jpnprd01.prod.outlook.com, the reply for comment 10.
After creating 'test_slot1', another 'test_slot2' is also created, and the
function generates the RUNNING_XLOG record. The backtrace is as follows:

pg_create_logical_replication_slot
create_logical_replication_slot
CreateInitDecodingContext
ReplicationSlotReserveWal
LogStandbySnapshot
LogCurrentRunningXacts
XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);

check_for_confirmed_flush_lsn() detects the record and raises FATAL error before
checking GUC on new cluster.

9.
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
Should removal of the slot be done as part of the cleanup of the
previous test, instead of preparing for this one?

Moved to cleanup part.

10.
# 3. Disable the subscription once
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
$old_publisher->stop;

10a.
What do you mean by "once"?

I added the word because the subscription would be enabled again.
But after considering more, I thought "Temporarily" seems better. Fixed.

10b.
That old_publisher->stop; seems strangely placed. Why is it here?

We must shut down the cluster before doing pg_upgrade. Isn't it same as line 124?

```
# 2. Generate extra WAL records. Because these WAL records do not get consumed
# it will cause the upcoming pg_upgrade test to fail.
$old_publisher->safe_psql('postgres',
"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
);
$old_publisher->stop;
```

11.
# Check that the slot 'test_slot1' has migrated to the new cluster
$new_publisher->start;
my $result = $new_publisher->safe_psql('postgres',
"SELECT slot_name, two_phase FROM pg_replication_slots");
is($result, qq(sub|t), 'check the slot exists on new cluster');

~

That comment now seems wrong. That slot was previously removed, right?

Yeah, it should be 'sub'. Changed.

12.
# Update the connection
my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
$subscriber->safe_psql('postgres',
"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");

~

Maybe better to combine both SQL.

Combined.

[1]: /messages/by-id/TYAPR01MB5866D7677BAE6F66839570FCF5E3A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYAPR01MB58668021BB233D129B466122F51CA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#162

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#159)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 25, 2023 at 12:09 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

FYI, the v24-0003 tests for pg_upgrade did not work for me:

Hmm, I ran tests more than 1hr but could not reproduce the failure.
cfbot also said OK multiple times...

Today I rebuilt everything clean from the ground up and applied all
v24*. But this time everything passes. (I have repeated the test 3x
and 3x it passes)

I don't know what is different, but I have a theory that perhaps
yesterday the v24-0001 patch did not apply correctly for me (due to
there being a pre-existing
contrib/test_decoding/t/002_always_persist.pl even after a make
clean), but that I did not notice the error (due to it being hidden
among the other whitespace warnings) when applying that first patch.

I think we can assume this was a problem with my environment. Of
course, if I ever see it happen again I will let you know.

Sorry for the false alarm.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#163

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#160)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are my review comments for patch v25-0002.

In general, I feel where possible the version checking is best done in
the "check.c" file (the filename is a hint). Most of the review
comments below are repeating this point.

======
Commit message.

1.
I felt this should mention the limitation that the slot upgrade
feature is only supported from PG17 slots upwards.

======
doc/src/sgml/ref/pgupgrade.sgml

2.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Currently,
+     <application>pg_upgrade</application> supports migrate logical replication
+     slots when the old cluster is 17.X and later.
+    </para>

Currently, <application>pg_upgrade</application> supports migrate
logical replication slots when the old cluster is 17.X and later.

SUGGESTION
Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

======
src/bin/pg_upgrade/check.c

3. GENERAL

IMO all version checking for this feature should only be done within
this "check.c" file as much as possible.

The detailed reason for this PG17 limitation can be in the file header
comment of "pg_upgrade.c", and then all the version checks can simply
say something like:
"Logical slot migration is only support for slots in PostgreSQL 17.0
and later. See atop file pg_upgrade.c for an explanation of this
limitation "

~~~

4. check_and_dump_old_cluster

+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos();
+

IMO the version checking should only be done in the "checking"
functions, so it should be removed from the within
get_logical_slot_infos() and put here in the caller.

SUGGESTION

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
/* Extract a list of logical replication slots */
get_logical_slot_infos();
}

~~~

5. check_new_cluster_logical_replication_slots

+check_new_cluster_logical_replication_slots(void)
+{
+ PGresult   *res;
+ PGconn    *conn;
+ int nslots = count_logical_slots();
+ int max_replication_slots;
+ char    *wal_level;
+
+ /* Quick exit if there are no logical slots on the old cluster */
+ if (nslots == 0)
+ return;

IMO the version checking should only be done in the "checking"
functions, so it should be removed from the count_logical_slots() and
then this code should be written more like this:

SUGGESTION (notice the quick return comment change too)

int nslots = 0;

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
nslots = count_logical_slots();

/* Quick return if there are no logical slots to be migrated. */
if (nslots == 0)
return;

======
src/bin/pg_upgrade/info.c

6. GENERAL

For the sake of readability it might be better to make the function
names more explicit:

get_logical_slot_infos() -> get_old_cluster_logical_slot_infos()
count_logical_slots() -> count_old_cluster_logical_slots()

~~~

7. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_logical_slot_infos(void)

Move all this detailed explanation about the limitation to the
file-level comment in "pg_upgrade.c". See also review comment #3.

~~~

8. get_logical_slot_infos

+void
+get_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;

IMO the version checking is best done in the "checking" functions. See
previous review comments about the caller of this. If you want to put
something here, then just have an Assert:

Assert(GET_MAJOR_VERSION(old_cluster.major_version) >= 1700);

~~~

9. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

IMO it is better to remove the version-checking side-effect here. Do
the version checks from the "check" functions where this is called
from. Also removing the check from here gives the ability to output
more useful messages -- e.g. review comment #11

======
src/bin/pg_upgrade/pg_upgrade.c

10. File-level comment

Add a detailed explanation about the limitation in the file-level
comment. See review comment #3 for details.

~~~

11.
+ /*
+ * Create logical replication slots.
+ *
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
+

IMO it is better to do the explicit version checking here, instead of
relying on a side-effect within the count_logical_slots() function.

SUGGESTION #1

/* Logical replication slot upgrade only supported for old_cluster >= PG17 */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
if (count_logical_slots())
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

AND...

By doing this, you will be able to provide more useful output here like this:

SUGGESTION #2 (my preferred)

if (count_logical_slots())
{
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
{
pg_log(PG_WARNING,
"\nWARNING: This utility can only upgrade logical
replication slots present in PostgreSQL version %s and later.",
"17.0");
}
else
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#164

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#160)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san,

Here are my review comments for patch v25-0003.

======
src/bin/pg_upgrade/check.c

1. GENERAL

+static void check_for_confirmed_flush_lsn(void);
+static void check_for_lost_slots(void);

For more clarity, I wonder if it is better to rename some functions:

check_for_confirmed_flush_lsn() -> check_old_cluster_for_confirmed_flush_lsn()
check_for_lost_slots() -> check_old_cluster_for_lost_slots()

~~~

2.
+ /*
+ * Logical replication slots can be migrated since PG17. See comments atop
+ * get_logical_slot_infos().
+ */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+ {
+ check_for_lost_slots();
+
+ /*
+ * Do additional checks if a live check is not required. This requires
+ * that confirmed_flush_lsn of all the slots is the same as the latest
+ * checkpoint location, but it would be satisfied only when the server
+ * has been shut down.
+ */
+ if (!live_check)
+ check_for_confirmed_flush_lsn();
+ }
+

2a.
If my suggestions from v25-0002 [1]My review of patch v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com are adopted then this comment
needs to change to say like "See atop file pg_upgrade.c..."

2b.
Hmm. If my suggestions from v25-0002 [1]My review of patch v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com are adopted then the version
checking and the slot counting would *already* be in this calling
function. In that case, why can't this whole fragment be put in the
same place? E.g. IIUC there is no reason to call these at checks all
when the old_cluster slot count is already known to be 0. Similarly,
there is no reason that both these functions need to be independently
checking count_logical_slots again since we have already done that
(again, assuming my suggestions from v25-0002 [1]My review of patch v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com are adopted).

~~~

3. check_for_lost_slots

+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(void)

This was forward-declared to be static, but the static function
modifier is absent here.

4. check_for_lost_slots

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_logical_slots() == 0)
+ return;
+

AFAICT this quick exit can be removed. See my comment #2b.

~~~

5. check_for_confirmed_flush_lsn

+check_for_confirmed_flush_lsn(void)
+{
+ int i,
+ ntups,
+ i_slotname;
+ PGresult   *res;
+ DbInfo    *active_db = &old_cluster.dbarr.dbs[0];
+ PGconn    *conn;
+
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_logical_slots() == 0)
+ return;

AFAICT this quick exit can be removed. See my comment #2b.

======
.../t/003_logical_replication_slots.pl

6.
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;

In my previous 0003 review ([2]My review of v24-0003 - /messages/by-id/CAHut+Ps5=9q1CCyrrytyv-8oUBqE6rv-=YFSRuuQwVf+smC-Kw@mail.gmail.com #10b) I was not questioning the need
for the $old_publisher->stop; before the pg_upgrade. I was only asking
why it was done at this location (after the DISABLE) instead of
earlier.

~~~

7.
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+ "INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to the subscriber');

/shipped/replicated/

------
[1]: My review of patch v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com
/messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com
[2]: My review of v24-0003 - /messages/by-id/CAHut+Ps5=9q1CCyrrytyv-8oUBqE6rv-=YFSRuuQwVf+smC-Kw@mail.gmail.com
/messages/by-id/CAHut+Ps5=9q1CCyrrytyv-8oUBqE6rv-=YFSRuuQwVf+smC-Kw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#165

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#163)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Aug 25, 2023 at 2:14 PM Peter Smith <smithpb2250@gmail.com> wrote:

Here are my review comments for patch v25-0002.

In general, I feel where possible the version checking is best done in
the "check.c" file (the filename is a hint). Most of the review
comments below are repeating this point.

======
Commit message.

1.
I felt this should mention the limitation that the slot upgrade
feature is only supported from PG17 slots upwards.

======
doc/src/sgml/ref/pgupgrade.sgml
2.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Currently,
+     <application>pg_upgrade</application> supports migrate logical replication
+     slots when the old cluster is 17.X and later.
+    </para>
Currently, <application>pg_upgrade</application> supports migrate
logical replication slots when the old cluster is 17.X and later.

SUGGESTION
Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

======
src/bin/pg_upgrade/check.c

3. GENERAL

IMO all version checking for this feature should only be done within
this "check.c" file as much as possible.

The detailed reason for this PG17 limitation can be in the file header
comment of "pg_upgrade.c", and then all the version checks can simply
say something like:
"Logical slot migration is only support for slots in PostgreSQL 17.0
and later. See atop file pg_upgrade.c for an explanation of this
limitation "

I don't think it is a good idea to move these comments atop
pg_upgrade.c as it is specific to slots. To me, the current place
proposed by the patch appears reasonable.

~~~

4. check_and_dump_old_cluster
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos();
+
IMO the version checking should only be done in the "checking"
functions, so it should be removed from the within
get_logical_slot_infos() and put here in the caller.

I think we should do it where it makes more sense. As far as I can see
currently there is no such rule.

SUGGESTION

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
/* Extract a list of logical replication slots */
get_logical_slot_infos();
}

I find the current place better than this suggestion.

~~~

5. check_new_cluster_logical_replication_slots
+check_new_cluster_logical_replication_slots(void)
+{
+ PGresult   *res;
+ PGconn    *conn;
+ int nslots = count_logical_slots();
+ int max_replication_slots;
+ char    *wal_level;
+
+ /* Quick exit if there are no logical slots on the old cluster */
+ if (nslots == 0)
+ return;
IMO the version checking should only be done in the "checking"
functions, so it should be removed from the count_logical_slots() and
then this code should be written more like this:

SUGGESTION (notice the quick return comment change too)

int nslots = 0;

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
nslots = count_logical_slots();

/* Quick return if there are no logical slots to be migrated. */
if (nslots == 0)
return;

+1.

======
src/bin/pg_upgrade/info.c

6. GENERAL

For the sake of readability it might be better to make the function
names more explicit:

get_logical_slot_infos() -> get_old_cluster_logical_slot_infos()
count_logical_slots() -> count_old_cluster_logical_slots()

~~~

7. get_logical_slot_infos
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_logical_slot_infos(void)
Move all this detailed explanation about the limitation to the
file-level comment in "pg_upgrade.c". See also review comment #3.

-1. This is not generic enough to be moved to pg_upgrade.c.

11.
+ /*
+ * Create logical replication slots.
+ *
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
+
IMO it is better to do the explicit version checking here, instead of
relying on a side-effect within the count_logical_slots() function.

SUGGESTION #1

/* Logical replication slot upgrade only supported for old_cluster >= PG17 */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
if (count_logical_slots())
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

AND...

By doing this, you will be able to provide more useful output here like this:

SUGGESTION #2 (my preferred)

if (count_logical_slots())
{
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
{
pg_log(PG_WARNING,
"\nWARNING: This utility can only upgrade logical
replication slots present in PostgreSQL version %s and later.",
"17.0");
}
else
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

I don't like suggestion#2 much. I don't feel the need for such a WARNING.

--
With Regards,
Amit Kapila.

#166

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#163)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version patch set.

======
Commit message.

1.
I felt this should mention the limitation that the slot upgrade
feature is only supported from PG17 slots upwards.

Added. The same sentence as doc was used.

doc/src/sgml/ref/pgupgrade.sgml
2.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Currently,
+     <application>pg_upgrade</application> supports migrate logical
replication
+     slots when the old cluster is 17.X and later.
+    </para>
Currently, <application>pg_upgrade</application> supports migrate
logical replication slots when the old cluster is 17.X and later.

SUGGESTION
Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

Fixed.

src/bin/pg_upgrade/check.c

3. GENERAL

IMO all version checking for this feature should only be done within
this "check.c" file as much as possible.

The detailed reason for this PG17 limitation can be in the file header
comment of "pg_upgrade.c", and then all the version checks can simply
say something like:
"Logical slot migration is only support for slots in PostgreSQL 17.0
and later. See atop file pg_upgrade.c for an explanation of this
limitation "

Hmm, I'm not sure it should be and Amit disagreed [1]/messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com.
I did not address this one.

4. check_and_dump_old_cluster
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos();
+
IMO the version checking should only be done in the "checking"
functions, so it should be removed from the within
get_logical_slot_infos() and put here in the caller.

SUGGESTION

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
/* Extract a list of logical replication slots */
get_logical_slot_infos();
}

Per discussion [1]/messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com, I did not address the comment.

5. check_new_cluster_logical_replication_slots
+check_new_cluster_logical_replication_slots(void)
+{
+ PGresult   *res;
+ PGconn    *conn;
+ int nslots = count_logical_slots();
+ int max_replication_slots;
+ char    *wal_level;
+
+ /* Quick exit if there are no logical slots on the old cluster */
+ if (nslots == 0)
+ return;
IMO the version checking should only be done in the "checking"
functions, so it should be removed from the count_logical_slots() and
then this code should be written more like this:

SUGGESTION (notice the quick return comment change too)

int nslots = 0;

/* Logical slots can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
nslots = count_logical_slots();

/* Quick return if there are no logical slots to be migrated. */
if (nslots == 0)
return;

Fixed.

src/bin/pg_upgrade/info.c

6. GENERAL

For the sake of readability it might be better to make the function
names more explicit:

get_logical_slot_infos() -> get_old_cluster_logical_slot_infos()
count_logical_slots() -> count_old_cluster_logical_slots()

Fixed. Moreover, get_logical_slot_infos_per_db() also followed the style.

7. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks
done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_logical_slot_infos(void)

Move all this detailed explanation about the limitation to the
file-level comment in "pg_upgrade.c". See also review comment #3.

Per discussion [1]/messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com, I did not address the comment.

8. get_logical_slot_infos
+void
+get_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
IMO the version checking is best done in the "checking" functions. See
previous review comments about the caller of this. If you want to put
something here, then just have an Assert:

Assert(GET_MAJOR_VERSION(old_cluster.major_version) >= 1700);

As I said above, check_and_dump_old_cluster() still does not check major version
before calling get_old_cluster_logical_slot_infos(). So I kept current style.

9. count_logical_slots
+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}
IMO it is better to remove the version-checking side-effect here. Do
the version checks from the "check" functions where this is called
from. Also removing the check from here gives the ability to output
more useful messages -- e.g. review comment #11

Apart from this, count_old_cluster_logical_slots() are called after checking
major version. Assert() was added instead.

src/bin/pg_upgrade/pg_upgrade.c

10. File-level comment

Add a detailed explanation about the limitation in the file-level
comment. See review comment #3 for details.

Per discussion [1]/messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com, I did not address the comment.

11.
+ /*
+ * Create logical replication slots.
+ *
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
+
IMO it is better to do the explicit version checking here, instead of
relying on a side-effect within the count_logical_slots() function.

SUGGESTION #1

/* Logical replication slot upgrade only supported for old_cluster >= PG17

if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
if (count_logical_slots())
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

AND...

By doing this, you will be able to provide more useful output here like this:

SUGGESTION #2 (my preferred)

if (count_logical_slots())
{
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
{
pg_log(PG_WARNING,
"\nWARNING: This utility can only upgrade logical
replication slots present in PostgreSQL version %s and later.",
"17.0");
}
else
{
start_postmaster(&new_cluster, true);
create_logical_replication_slots();
stop_postmaster(false);
}
}

Per discussion [1]/messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com, SUGGESTION #1 was chosen.

[1]: /messages/by-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v26-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchapplication/octet-stream; name=v26-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchDownload

From 73b3be0e9eacfeffde32ad5871d49eb7551494a2 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v26 1/3] Persist to disk logical slots during a shutdown
 checkpoint if the updated confirmed_flush_lsn has not yet been persisted.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  33 ++++--
 src/include/replication/slot.h                |   5 +-
 src/test/subscription/meson.build             |   1 +
 src/test/subscription/t/034_always_persist.pl | 106 ++++++++++++++++++
 5 files changed, 135 insertions(+), 12 deletions(-)
 create mode 100644 src/test/subscription/t/034_always_persist.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..1c6db2a99a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_persisted_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,7 +1734,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1747,8 +1749,16 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * Don't do anything if there's nothing to write, unless this is called for
+	 * a logical slot during a shutdown checkpoint and if the updated
+	 * confirmed_flush LSN has not yet been persisted, as we want to persist
+	 * the updated confirmed_flush LSN in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty &&
+		!(SlotIsLogical(slot) && is_shutdown &&
+		  (slot->data.confirmed_flush != slot->last_persisted_confirmed_flush)))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1878,6 +1888,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+
+	slot->last_persisted_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2086,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_persisted_confirmed_flush =  cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..b519f7af5f 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,9 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/* The last persisted confirmed flush lsn */
+	XLogRecPtr	last_persisted_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +244,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index bd673a9d68..cdd2f8ba47 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -40,6 +40,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_always_persist.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_always_persist.pl b/src/test/subscription/t/034_always_persist.pl
new file mode 100644
index 0000000000..9973476fff
--- /dev/null
+++ b/src/test/subscription/t/034_always_persist.pl
@@ -0,0 +1,106 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file itself
+	my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file found\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok($latest_checkpoint eq $confirmed_flush_from_log,
+		"Check the decoding starts from the confirmed_flush which is the same as the latest_checkpoint");
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create table
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));"
+);
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*) FROM test_tbl"
+);
+
+is($result, qq(5), "check initial copy was done");
+
+# Set wal_receiver_status_interval to zero to suppress keepalive messages
+# between nodes.
+$node_subscriber->append_conf('postgresql.conf', q{
+wal_receiver_status_interval = 0
+});
+$node_subscriber->reload();
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart publisher once. If the slot has persisted, the confirmed_flush_lsn
+# becomes the same as the latest checkpoint location, which means the
+# SHUTDOWN_CHECKPOINT record.
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset
+);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+	or die "could not get confirmed_flush_lsn";
+
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v26-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v26-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 4b264e15a6b5004ff47aa47c2cb67ef9e2624cb6 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v26 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  65 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  67 +++++++++
 src/bin/pg_upgrade/function.c                 |  18 ++-
 src/bin/pg_upgrade/info.c                     | 134 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  76 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 521 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..29695c3784 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..e7c82f19b0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_old_cluster_logical_slot_infos();
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1408,64 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+								  "FROM pg_catalog.pg_replication_slots "
+								  "WHERE slot_type = 'logical' AND "
+								  "temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slots but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..2813d2ff20 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -66,14 +71,21 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions, or referred
+		 * to by logical replication slots in this DB.
 		 */
 		ress[dbnum] = executeQueryOrDie(conn,
 										"SELECT DISTINCT probin "
 										"FROM pg_catalog.pg_proc "
 										"WHERE prolang = %u AND "
 										"probin IS NOT NULL AND "
-										"oid >= %u;",
+										"oid >= %u "
+										"UNION "
+										"SELECT DISTINCT plugin "
+										"FROM pg_catalog.pg_replication_slots "
+										"WHERE wal_status <> 'lost' AND "
+										"database = current_database() AND "
+										"temporary IS FALSE;",
 										ClanguageId,
 										FirstNormalObjectId);
 		totaltups += PQntuples(ress[dbnum]);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..2c444aa094 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,115 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(&old_cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_old_cluster_logical_slot_infos(void)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	pg_log(PG_VERBOSE, "\nsource databases:");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+		get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	Assert(GET_MAJOR_VERSION(old_cluster.major_version) >= 1700);
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +720,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +776,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..53442ed67c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,22 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Logical replication slot upgrade only supported for old_cluster >= PG17.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		if (count_old_cluster_logical_slots())
+		{
+			start_postmaster(&new_cluster, true);
+			create_logical_replication_slots();
+			stop_postmaster(false);
+		}
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +877,62 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..dae92ef6c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_old_cluster_logical_slot_infos(void);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..ae87c33708
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'test_slot1' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v26-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v26-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From c1dd682d3727e86e03b695d9130379cc13839207 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v26 3/3] pg_upgrade: Add check function for logical
 replication slots

To prevent data loss, pg_upgrade will fail if the old node has slots with the
status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 113 ++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              |  37 ++++++
 src/bin/pg_upgrade/info.c                     |   4 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 .../t/003_logical_replication_slots.pl        |  91 ++++++++++++--
 5 files changed, 237 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e7c82f19b0..4c6a89a870 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -31,6 +32,8 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_confirmed_flush_lsn(void);
+static void check_old_cluster_for_lost_slots(void);
 
 
 /*
@@ -108,6 +111,24 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		check_old_cluster_for_lost_slots();
+
+		/*
+		 * Do additional checks if a live check is not required. This requires
+		 * that confirmed_flush_lsn of all the slots is the same as the latest
+		 * checkpoint location, but it would be satisfied only when the server
+		 * has been shut down.
+		 */
+		if (!live_check)
+			check_old_cluster_for_confirmed_flush_lsn();
+	}
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -1469,3 +1490,95 @@ check_new_cluster_logical_replication_slots(void)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+static void
+check_old_cluster_for_lost_slots(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_confirmed_flush_lsn(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..ad9d0c2702 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,43 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char *slash = NULL;
+					uint32 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 2c444aa094..9c631ec043 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -663,8 +663,8 @@ get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
  * Note: This function will not do anything if the old cluster is pre-PG 17.
  * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
  * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
- * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
- * included.
+ * check_old_cluster_for_confirmed_flush_lsn() would raise a FATAL error if
+ * such slots are included.
  */
 void
 get_old_cluster_logical_slot_infos(void)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index dae92ef6c0..e72318f500 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index ae87c33708..3a6c7cf8bd 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -22,6 +22,10 @@ $old_publisher->init(allows_streaming => 'logical');
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
@@ -65,13 +69,19 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. wal_level is set correctly on the new cluster
+# 4. wal_level is set correctly on the new cluster
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # pg_upgrade will fail because the new cluster has insufficient max_replication_slots
@@ -95,7 +105,7 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
 
 # Preparations for the subsequent test:
 # 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
@@ -106,10 +116,57 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
 );
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
 # Actual run, successful upgrade is expected
@@ -129,11 +186,29 @@ command_ok(
 ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-# Check that the slot 'test_slot1' has migrated to the new cluster
+# Check that the slot 'sub' has migrated to the new cluster
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get shipped to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#167

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#164)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! The patch can be available in [1]/messages/by-id/TYAPR01MB5866C6DE11EBC96752CEB7DEF5E2A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Here are my review comments for patch v25-0003.

======
src/bin/pg_upgrade/check.c

1. GENERAL
+static void check_for_confirmed_flush_lsn(void);
+static void check_for_lost_slots(void);
For more clarity, I wonder if it is better to rename some functions:

check_for_confirmed_flush_lsn() -> check_old_cluster_for_confirmed_flush_lsn()
check_for_lost_slots() -> check_old_cluster_for_lost_slots()

Replaced.

2.
+ /*
+ * Logical replication slots can be migrated since PG17. See comments atop
+ * get_logical_slot_infos().
+ */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+ {
+ check_for_lost_slots();
+
+ /*
+ * Do additional checks if a live check is not required. This requires
+ * that confirmed_flush_lsn of all the slots is the same as the latest
+ * checkpoint location, but it would be satisfied only when the server
+ * has been shut down.
+ */
+ if (!live_check)
+ check_for_confirmed_flush_lsn();
+ }
+
2a.
If my suggestions from v25-0002 [1] are adopted then this comment
needs to change to say like "See atop file pg_upgrade.c..."

2b.
Hmm. If my suggestions from v25-0002 [1] are adopted then the version
checking and the slot counting would *already* be in this calling
function. In that case, why can't this whole fragment be put in the
same place? E.g. IIUC there is no reason to call these at checks all
when the old_cluster slot count is already known to be 0. Similarly,
there is no reason that both these functions need to be independently
checking count_logical_slots again since we have already done that
(again, assuming my suggestions from v25-0002 [1] are adopted).

Currently I did not accept the comment, so they were ignored.

3. check_for_lost_slots
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(void)
This was forward-declared to be static, but the static function
modifier is absent here.

Fixed.

4. check_for_lost_slots
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_logical_slots() == 0)
+ return;
+
AFAICT this quick exit can be removed. See my comment #2b.

2b was skipped, so IIUC this is still needed.

5. check_for_confirmed_flush_lsn

+check_for_confirmed_flush_lsn(void)
+{
+ int i,
+ ntups,
+ i_slotname;
+ PGresult   *res;
+ DbInfo    *active_db = &old_cluster.dbarr.dbs[0];
+ PGconn    *conn;
+
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_logical_slots() == 0)
+ return;

AFAICT this quick exit can be removed. See my comment #2b.

I kept the style.

.../t/003_logical_replication_slots.pl
6.
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
$old_publisher->stop;
In my previous 0003 review ([2] #10b) I was not questioning the need
for the $old_publisher->stop; before the pg_upgrade. I was only asking
why it was done at this location (after the DISABLE) instead of
earlier.

I see. The reason was to avoid unnecessary error by apply worker.

As the premise, the position of shutting down (before or after the DISABLE) does
not affect the result. But if it puts earlier than DISABLE, the apply worker will
exit with below error because the walsender exits ealier than worker:

```
ERROR: could not send end-of-streaming message to primary: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no COPY in progress
```

It is not problematic but future readers may be confused if it find.
So I avoided it.

7.
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+ "INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are shipped to the subscriber');

/shipped/replicated/

You meant to say s/replicated/shipped/, right? Fixed.

[1]: /messages/by-id/TYAPR01MB5866C6DE11EBC96752CEB7DEF5E2A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#168

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#166)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Aug 26, 2023 at 9:54 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

Thank you for reviewing! PSA new version patch set.

I haven't read this thread in detail, but I have one high-level design
question. The upgrade of the replication slot is default or is it
under some GUC? because if it is by default then some of the users
might experience failure in some cases e.g. a) wal_level in the new
cluster is not logical b) If this new check
check_old_cluster_for_confirmed_flush_lsn() fails due to confirm flush
LSN is not at the latest shutdown checkpoint. I am not sure whether this
is a problem or could be just handled by documenting this behavior.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#169

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Dilip Kumar (#168)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Dilip,

Thank you for reading the thread!

I haven't read this thread in detail, but I have one high-level design
question. The upgrade of the replication slot is default or is it
under some GUC?

I designed that logical slots were upgraded by default.

because if it is by default then some of the users
might experience failure in some cases e.g. a) wal_level in the new
cluster is not logical b) If this new check
check_old_cluster_for_confirmed_flush_lsn() fails due to confirm flush
LSN is not at the latest shutdown checkpoint. I am not sure whether this
is a problem or could be just handled by documenting this behavior.

I think it should be done by default to avoid WAL hole. If we do not provide the
upgrading by default, users may forget to specify the option. After sometime
he/she would notice that slots are not migrated and would create slots at that time,
but this leads data loss of subscriber. The inconsistency between nodes is really
bad. Developers requested to enable by default [1]/messages/by-id/ad83b9f2-ced3-c51c-342a-cc281ff562fc@postgresql.org.

Moreover, checking related with logical slots are skipped when slots are not defined
on the old cluster. So it do not affect when users do not use logical slots.

Also, we are considering that an option for excluding slots is introduced after
committed once [2]/messages/by-id/CAA4eK1KxP+gogYOsTHbZVPO7Pp38gcRjEWUxv+4X3dFept3z3A@mail.gmail.com.

[1]: /messages/by-id/ad83b9f2-ced3-c51c-342a-cc281ff562fc@postgresql.org
[2]: /messages/by-id/CAA4eK1KxP+gogYOsTHbZVPO7Pp38gcRjEWUxv+4X3dFept3z3A@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#170

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#166)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi, here are my comments for patch v26-0002.

======
1. About the PG17 limitation

In my previous review of v25-0002, I suggested that the PG17
limitation should be documented atop one of the source files. See
[1]: My review of v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com

I just wanted to explain the reason for that suggestion.

Currently, all the new version checks have a comment like "/* Logical
slots can be migrated since PG17. */". I felt that it would be better
if those comments said something more like "/* Logical slots can be
migrated since PG17. See XYZ for details. */". I don't really care
*where* the main explanation lives, but I thought since it is
referenced from multiple places it might be easier to find if it was
atop some file instead of just in a function comment. YMMV.

======
2. Do version checking in check_and_dump_old_cluster instead of inside
get_old_cluster_logical_slot_infos

check_and_dump_old_cluster - Should check version before calling
get_old_cluster_logical_slot_infos
get_old_cluster_logical_slot_infos - Keep a sanity check Assert if you
wish (or do nothing -- e.g. see #3 below)

Refer to [1]My review of v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com#4, [1]My review of v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com#8

Isn't it self-evident from the file/function names what kind of logic
they are intended to have in them? Sure, there may be some exceptions
but unless it is difficult to implement I think most people would
reasonably assume:

- checking code should be in file "check.c"
-- e.g. a function called 'check_and_dump_old_cluster' ought to be
*checking* stuff

- info fetching code should be in file "info.c"

Another motivation for this suggestion becomes more obvious later with
patch 0003. By checking at the "higher" level (in check.c) it means
multiple related functions can all be called under one version check.
Less checking means less code and/or simpler code. For example,
multiple redundant calls to get_old_cluster_count_slots() can be
avoided in patch 0003 by writing *less* code, than v26* currently has.

======
3. count_old_cluster_logical_slots

I think there is nothing special in this logic that will crash if PG
version <= 1600. Keep the Assert for sanity checking if you wish, but
this is already guarded by the call in pg_upgrade.c so perhaps it is
overkill.

------
[1]: My review of v25-0002 - /messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com
/messages/by-id/CAHut+PtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#171

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#166)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi, here are my review comments for v26-0003

It seems I must defend some of my previous suggestions from v25* [1]my review of v25-0003. /messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com,
so here goes...

======
src/bin/pg_upgrade/check.c

1. check_and_dump_old_cluster

CURRENT CODE (with v26-0003 patch applied)

/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

...

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}

SUGGESTION

if (count_old_cluster_slots()) // NOTE 1b.
{
check_old_cluster_for_lost_slots();

Benefits:

1a.
One version check instead of multiple.

1b.
Upfront slot counting means
- only call 1 time to count_old_cluster_slots().
- unnecessary calls to other check* functions are avoided

1c.
get_old_cluster_logical_slot_infos
- No version check is needed.

check_old_cluster_for_lost_slots
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

check_old_cluster_for_confirmed_flush_lsn
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

~~~

2. check_old_cluster_for_lost_slots

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

Refer [1]my review of v25-0003. /messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com#4. Can remove this because #1b above.

~~~

3. check_old_cluster_for_confirmed_flush_lsn

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

Refer [1]my review of v25-0003. /messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com#5. Can remove this because #1b above.

~~~

4. .../t/003_logical_replication_slots.pl

/shipped/replicated/

Kuroda-san 26/8 wrote:
You meant to say s/replicated/shipped/, right? Fixed.

No, I meant what I wrote for [1]my review of v25-0003. /messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com#7. I was referring to the word
"shipped" in the message 'check changes are shipped to the
subscriber'. Now there are 2 places to change instead of one.

------
[1]: my review of v25-0003. /messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com
/messages/by-id/CAHut+PsdkhcVG5GY4ZW0DMUF8FG=WvjaGN+NA4XFLrzxWSQXVA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#172

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#170)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version patch set.

======
1. About the PG17 limitation

In my previous review of v25-0002, I suggested that the PG17
limitation should be documented atop one of the source files. See
[1]#3, [1]#7, [1]#10

I just wanted to explain the reason for that suggestion.

Currently, all the new version checks have a comment like "/* Logical
slots can be migrated since PG17. */". I felt that it would be better
if those comments said something more like "/* Logical slots can be
migrated since PG17. See XYZ for details. */". I don't really care
*where* the main explanation lives, but I thought since it is
referenced from multiple places it might be easier to find if it was
atop some file instead of just in a function comment. YMMV.

======
2. Do version checking in check_and_dump_old_cluster instead of inside
get_old_cluster_logical_slot_infos

check_and_dump_old_cluster - Should check version before calling
get_old_cluster_logical_slot_infos
get_old_cluster_logical_slot_infos - Keep a sanity check Assert if you
wish (or do nothing -- e.g. see #3 below)

Refer to [1]#4, [1]#8

Isn't it self-evident from the file/function names what kind of logic
they are intended to have in them? Sure, there may be some exceptions
but unless it is difficult to implement I think most people would
reasonably assume:

- checking code should be in file "check.c"
-- e.g. a function called 'check_and_dump_old_cluster' ought to be
*checking* stuff

- info fetching code should be in file "info.c"

~~

Another motivation for this suggestion becomes more obvious later with
patch 0003. By checking at the "higher" level (in check.c) it means
multiple related functions can all be called under one version check.
Less checking means less code and/or simpler code. For example,
multiple redundant calls to get_old_cluster_count_slots() can be
avoided in patch 0003 by writing *less* code, than v26* currently has.

IIUC these points were disagreed by Amit, so I would keep my code until he posts
opinions.

3. count_old_cluster_logical_slots

I think there is nothing special in this logic that will crash if PG
version <= 1600. Keep the Assert for sanity checking if you wish, but
this is already guarded by the call in pg_upgrade.c so perhaps it is
overkill.

Your point is right.
I have checked some version-specific functions like check_for_aclitem_data_type_usage()
and check_for_user_defined_encoding_conversions(), they do not have assert(). So
removed from it. As for free_db_and_rel_infos(), the Assert() ensures that new
cluster does not have logical slots, so I kept it.

Also, I found that get_loadable_libraries() always read pg_replication_slots,
even if the old cluster is older than PG17. This let additional checks for logical
decoding output plugins. Moreover, prior than PG12 could not be upgrade because
they do not have an attribute wal_status.

I think the checking should be done only when old_cluster is >= PG17, so fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v27-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchapplication/octet-stream; name=v27-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patchDownload

From fc5ffcb9d5ec5034f0a2b0ad4bbf0a1e0fb1d7e5 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v27 1/3] Persist to disk logical slots during a shutdown
 checkpoint if the updated confirmed_flush_lsn has not yet been persisted.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  33 ++++--
 src/include/replication/slot.h                |   5 +-
 src/test/subscription/meson.build             |   1 +
 src/test/subscription/t/034_always_persist.pl | 106 ++++++++++++++++++
 5 files changed, 135 insertions(+), 12 deletions(-)
 create mode 100644 src/test/subscription/t/034_always_persist.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..6dced61cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7026,7 +7026,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..1c6db2a99a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_persisted_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,7 +1734,8 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
@@ -1747,8 +1749,16 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	slot->just_dirtied = false;
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/*
+	 * Don't do anything if there's nothing to write, unless this is called for
+	 * a logical slot during a shutdown checkpoint and if the updated
+	 * confirmed_flush LSN has not yet been persisted, as we want to persist
+	 * the updated confirmed_flush LSN in that case, even if that's the only
+	 * modification.
+	 */
+	if (!was_dirty &&
+		!(SlotIsLogical(slot) && is_shutdown &&
+		  (slot->data.confirmed_flush != slot->last_persisted_confirmed_flush)))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1878,6 +1888,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+
+	slot->last_persisted_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2086,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_persisted_confirmed_flush =  cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..b519f7af5f 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,9 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/* The last persisted confirmed flush lsn */
+	XLogRecPtr	last_persisted_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +244,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index bd673a9d68..cdd2f8ba47 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -40,6 +40,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_always_persist.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_always_persist.pl b/src/test/subscription/t/034_always_persist.pl
new file mode 100644
index 0000000000..9973476fff
--- /dev/null
+++ b/src/test/subscription/t/034_always_persist.pl
@@ -0,0 +1,106 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file itself
+	my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file found\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok($latest_checkpoint eq $confirmed_flush_from_log,
+		"Check the decoding starts from the confirmed_flush which is the same as the latest_checkpoint");
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create table
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));"
+);
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*) FROM test_tbl"
+);
+
+is($result, qq(5), "check initial copy was done");
+
+# Set wal_receiver_status_interval to zero to suppress keepalive messages
+# between nodes.
+$node_subscriber->append_conf('postgresql.conf', q{
+wal_receiver_status_interval = 0
+});
+$node_subscriber->reload();
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart publisher once. If the slot has persisted, the confirmed_flush_lsn
+# becomes the same as the latest checkpoint location, which means the
+# SHUTDOWN_CHECKPOINT record.
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset
+);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+	or die "could not get confirmed_flush_lsn";
+
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v27-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v27-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 3d511ccb75d5976848c0474957dd367690265036 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v27 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  65 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  67 +++++++++
 src/bin/pg_upgrade/function.c                 |  44 ++++--
 src/bin/pg_upgrade/info.c                     | 131 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  76 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 537 insertions(+), 11 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..29695c3784 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..e7c82f19b0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_old_cluster_logical_slot_infos();
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1408,64 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+								  "FROM pg_catalog.pg_replication_slots "
+								  "WHERE slot_type = 'logical' AND "
+								  "temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slots but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..2da8ca404a 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,10 +61,32 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	PQExpBuffer	query = createPQExpBuffer();
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
+	/* Construct a query string */
+	appendPQExpBuffer(query, "SELECT DISTINCT probin "
+								"FROM pg_catalog.pg_proc "
+								"WHERE prolang = %u AND "
+								"probin IS NOT NULL AND "
+								"oid >= %u",
+								ClanguageId,
+								FirstNormalObjectId);
+
+	/*
+	 * If old_cluster is PG 17 or later, logical decoding output plugins must
+	 * also be included.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		appendPQExpBufferStr(query, " UNION "
+									"SELECT DISTINCT plugin "
+									"FROM pg_catalog.pg_replication_slots "
+									"WHERE wal_status <> 'lost' AND "
+									"database = current_database() AND "
+									"temporary IS FALSE;");
+
 	/* Fetch all library names, removing duplicates within each DB */
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
@@ -66,21 +94,17 @@ get_loadable_libraries(void)
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
 		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
+		 * Fetch all libraries containing non-built-in C functions, or referred
+		 * to by logical replication slots in this DB.
 		 */
-		ress[dbnum] = executeQueryOrDie(conn,
-										"SELECT DISTINCT probin "
-										"FROM pg_catalog.pg_proc "
-										"WHERE prolang = %u AND "
-										"probin IS NOT NULL AND "
-										"oid >= %u;",
-										ClanguageId,
-										FirstNormalObjectId);
+		ress[dbnum] = executeQueryOrDie(conn, "%s", query->data);
 		totaltups += PQntuples(ress[dbnum]);
 
 		PQfinish(conn);
 	}
 
+	destroyPQExpBuffer(query);
+
 	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
 	totaltups = 0;
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..a79c834fb5 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,112 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(&old_cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG 17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
+ * included.
+ */
+void
+get_old_cluster_logical_slot_infos(void)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	pg_log(PG_VERBOSE, "\nsource databases:");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+		get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +717,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +773,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..53442ed67c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,22 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Logical replication slot upgrade only supported for old_cluster >= PG17.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		if (count_old_cluster_logical_slots())
+		{
+			start_postmaster(&new_cluster, true);
+			create_logical_replication_slots();
+			stop_postmaster(false);
+		}
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +877,62 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn     *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..dae92ef6c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_old_cluster_logical_slot_infos(void);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..ae87c33708
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'test_slot1' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v27-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v27-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From 0fc7f1db28d47ebf025980d8edfd0f6eb00eeaff Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v27 3/3] pg_upgrade: Add check function for logical
 replication slots

To prevent data loss, pg_upgrade will fail if the old node has slots with the
status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 113 ++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              |  37 ++++++
 src/bin/pg_upgrade/info.c                     |   4 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 .../t/003_logical_replication_slots.pl        |  91 ++++++++++++--
 5 files changed, 237 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e7c82f19b0..4c6a89a870 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -31,6 +32,8 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_confirmed_flush_lsn(void);
+static void check_old_cluster_for_lost_slots(void);
 
 
 /*
@@ -108,6 +111,24 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+	{
+		check_old_cluster_for_lost_slots();
+
+		/*
+		 * Do additional checks if a live check is not required. This requires
+		 * that confirmed_flush_lsn of all the slots is the same as the latest
+		 * checkpoint location, but it would be satisfied only when the server
+		 * has been shut down.
+		 */
+		if (!live_check)
+			check_old_cluster_for_confirmed_flush_lsn();
+	}
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -1469,3 +1490,95 @@ check_new_cluster_logical_replication_slots(void)
 
 	check_ok();
 }
+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+static void
+check_old_cluster_for_lost_slots(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking wal_status for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	check_ok();
+}
+
+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_confirmed_flush_lsn(void)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking confirmed_flush_lsn for logical replication slots");
+
+	/*
+	 * Check that all logical replication slots have reached the latest
+	 * checkpoint position (SHUTDOWN_CHECKPOINT record).
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+				"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+	PQfinish(conn);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..ad9d0c2702 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,43 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char *slash = NULL;
+					uint32 upper_lsn, lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;			/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+										((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index a79c834fb5..65fcb396c3 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -663,8 +663,8 @@ get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
  * Note: This function will not do anything if the old cluster is pre-PG 17.
  * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
  * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
- * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
- * included.
+ * check_old_cluster_for_confirmed_flush_lsn() would raise a FATAL error if
+ * such slots are included.
  */
 void
 get_old_cluster_logical_slot_infos(void)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index dae92ef6c0..e72318f500 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index ae87c33708..640964c4e1 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -22,6 +22,10 @@ $old_publisher->init(allows_streaming => 'logical');
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
@@ -65,13 +69,19 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. wal_level is set correctly on the new cluster
+# 4. wal_level is set correctly on the new cluster
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # pg_upgrade will fail because the new cluster has insufficient max_replication_slots
@@ -95,7 +105,7 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
 
 # Preparations for the subsequent test:
 # 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
@@ -106,10 +116,57 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
 );
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
 # Actual run, successful upgrade is expected
@@ -129,11 +186,29 @@ command_ok(
 ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-# Check that the slot 'test_slot1' has migrated to the new cluster
+# Check that the slot 'sub' has migrated to the new cluster
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#173

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#171)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing!

1. check_and_dump_old_cluster

CURRENT CODE (with v26-0003 patch applied)

/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

...

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}

SUGGESTION

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
{
/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

if (count_old_cluster_slots()) // NOTE 1b.
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}
}

~~

Benefits:

1a.
One version check instead of multiple.

~

1b.
Upfront slot counting means
- only call 1 time to count_old_cluster_slots().
- unnecessary calls to other check* functions are avoided

~

1c.
get_old_cluster_logical_slot_infos
- No version check is needed.

check_old_cluster_for_lost_slots
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

check_old_cluster_for_confirmed_flush_lsn
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

~~~

2. check_old_cluster_for_lost_slots
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;
Refer [1]#4. Can remove this because #1b above.

3. check_old_cluster_for_confirmed_flush_lsn
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;
Refer [1]#5. Can remove this because #1b above.

IIUC these points were disagreed by Amit, so I would keep my code until he posts
opinions.

4. .../t/003_logical_replication_slots.pl

/shipped/replicated/

Kuroda-san 26/8 wrote:
You meant to say s/replicated/shipped/, right? Fixed.

No, I meant what I wrote for [1]#7. I was referring to the word
"shipped" in the message 'check changes are shipped to the
subscriber'. Now there are 2 places to change instead of one.

Oh, sorry for that. Both places was fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#174

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#171)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Aug 28, 2023 at 1:01 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi, here are my review comments for v26-0003

It seems I must defend some of my previous suggestions from v25* [1],
so here goes...

======
src/bin/pg_upgrade/check.c

1. check_and_dump_old_cluster

CURRENT CODE (with v26-0003 patch applied)

/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

...

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}

SUGGESTION

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
{
/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

if (count_old_cluster_slots()) // NOTE 1b.
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}
}

I think a slightly better way to achieve this is to combine the code
from check_old_cluster_for_lost_slots() and
check_old_cluster_for_confirmed_flush_lsn() into
check_old_cluster_for_valid_slots(). That will even save us a new
connection for the second check.

Also, I think we can simplify another check in the patch:
@@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void)
char *wal_level;

        /* Logical slots can be migrated since PG17. */
-       if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
-               nslots = count_old_cluster_logical_slots();
+       if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+               return;
+
+       nslots = count_old_cluster_logical_slots();

--
With Regards,
Amit Kapila.

#175

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#174)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving comments! PSA new version. I ran the pgindent.

1. check_and_dump_old_cluster

CURRENT CODE (with v26-0003 patch applied)

/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

...

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}

SUGGESTION

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
{
/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

if (count_old_cluster_slots()) // NOTE 1b.
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}
}

I think a slightly better way to achieve this is to combine the code
from check_old_cluster_for_lost_slots() and
check_old_cluster_for_confirmed_flush_lsn() into
check_old_cluster_for_valid_slots(). That will even save us a new
connection for the second check.

They are combined into one function.

Also, I think we can simplify another check in the patch:
@@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void)
char *wal_level;
/* Logical slots can be migrated since PG17. */
-       if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
-               nslots = count_old_cluster_logical_slots();
+       if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+               return;
+
+       nslots = count_old_cluster_logical_slots();

Fixed.

Also, I have tested the combination of this patch and the physical standby.

1. Logical slots defined on old physical standby *cannot be upgraded*
2. Logical slots defined on physical primary *are migrated* to new physical standby

The primal reason is that pg_upgrade cannot be used for physical standby. If
users want to upgrade standby, rsync command is used instead. The command
creates the cluster based on the based on the new primary, hence they are
replicated to new standby. In contrast, the old cluster is basically ignored so
that slots on old cluster is not upgraded. I updated the doc accordingly.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v28-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v28-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From b48ccc25062984173deb25699546d688971f4f0d Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v28 1/3] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  29 +++--
 src/include/replication/slot.h                |  13 ++-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 101 ++++++++++++++++++
 5 files changed, 133 insertions(+), 13 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..c075f76317 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,22 +1734,26 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
 	int			fd;
 	ReplicationSlotOnDisk cp;
 	bool		was_dirty;
+	bool		confirmed_flush_has_changed;
 
 	/* first check whether there's something to write out */
 	SpinLockAcquire(&slot->mutex);
 	was_dirty = slot->dirty;
 	slot->just_dirtied = false;
+	confirmed_flush_has_changed = (slot->data.confirmed_flush != slot->last_saved_confirmed_flush);
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/* Don't do anything if there's nothing to write. See ReplicationSlot. */
+	if (!was_dirty &&
+		!(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1873,11 +1878,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2080,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..448fb8cf51 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,17 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * We won't ensure that the slot is persisted after the confirmed_flush
+	 * LSN is updated as that could lead to frequent writes.  However, we need
+	 * to ensure that we do persist the slots at the time of shutdown whose
+	 * confirmed_flush LSN is changed since we last saved the slot to disk.
+	 * This will help in avoiding retreat of the confirmed_flush LSN after
+	 * restart.  This variable is used to track the last saved confirmed_flush
+	 * LSN value.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +252,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..6e114e9b29
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,101 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v28-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v28-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From f814b8e45e6aa39942de8bc169872e5b97b5e92c Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v28 2/3] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 ++++++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    |  69 +++++++++
 src/bin/pg_upgrade/function.c                 |  46 ++++--
 src/bin/pg_upgrade/info.c                     | 135 ++++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  19 +++
 .../t/003_logical_replication_slots.pl        | 139 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 544 insertions(+), 15 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..07bb46f89c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -629,8 +694,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..4056b7a7a9 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
 
 
 /*
@@ -89,6 +90,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_old_cluster_logical_slot_infos();
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -189,6 +193,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1408,66 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT slot_name "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res))
+		pg_fatal("New cluster must not have logical replication slots but found \"%s\"",
+				 PQgetvalue(res, 0, 0));
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5b4b414a91 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,12 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries:
+ *	1. Name of library files containing C-language functions (for non-built-in
+ *	   functions), and
+ *	2. Shared object (library) names containing the logical replication output
+ *	   plugins
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,32 +61,48 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	PQExpBuffer query = createPQExpBuffer();
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
+	/* Construct a query string for fetching non-built-in C functions */
+	appendPQExpBuffer(query, "SELECT DISTINCT probin "
+					  "FROM pg_catalog.pg_proc "
+					  "WHERE prolang = %u AND "
+					  "probin IS NOT NULL AND "
+					  "oid >= %u",
+					  ClanguageId,
+					  FirstNormalObjectId);
+
+	/*
+	 * If old_cluster is PG 17 or later, logical decoding output plugins must
+	 * also be included.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		appendPQExpBufferStr(query, " UNION "
+							 "SELECT DISTINCT plugin "
+							 "FROM pg_catalog.pg_replication_slots "
+							 "WHERE wal_status <> 'lost' AND "
+							 "database = current_database() AND "
+							 "temporary IS FALSE;");
+
 	/* Fetch all library names, removing duplicates within each DB */
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		DbInfo	   *active_db = &old_cluster.dbarr.dbs[dbnum];
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
-		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
-		 */
-		ress[dbnum] = executeQueryOrDie(conn,
-										"SELECT DISTINCT probin "
-										"FROM pg_catalog.pg_proc "
-										"WHERE prolang = %u AND "
-										"probin IS NOT NULL AND "
-										"oid >= %u;",
-										ClanguageId,
-										FirstNormalObjectId);
+		/* Extract a list of libraries */
+		ress[dbnum] = executeQueryOrDie(conn, "%s", query->data);
+
 		totaltups += PQntuples(ress[dbnum]);
 
 		PQfinish(conn);
 	}
 
+	destroyPQExpBuffer(query);
+
 	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
 	totaltups = 0;
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..c905e02d45 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,116 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(&old_cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
+ * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+void
+get_old_cluster_logical_slot_infos(void)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	pg_log(PG_VERBOSE, "\nsource databases:");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+		get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
+ * later.
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +721,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +777,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..c4bf12fd6b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,20 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Logical replication slot upgrade only supported for old_cluster >=
+	 * PG17.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +875,62 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..dae92ef6c0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +192,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,6 +417,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_old_cluster_logical_slot_infos(void);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..ae87c33708
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 3. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Consume WAL records
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+);
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'test_slot1' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v28-0003-pg_upgrade-Add-check-function-for-logical-replic.patchapplication/octet-stream; name=v28-0003-pg_upgrade-Add-check-function-for-logical-replic.patchDownload

From c957627ebbeb640fc39bbdd7f1f79f729ea7cddc Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 18 Aug 2023 11:57:37 +0000
Subject: [PATCH v28 3/3] pg_upgrade: Add check function for logical
 replication slots

To prevent data loss, pg_upgrade will fail if the old node has slots with the
status 'lost', or with unconsumed WAL records.

Author: Hayato Kuroda
Reviewed-by: Wang Wei, Vignesh C, Peter Smith, Hou Zhijie
---
 src/bin/pg_upgrade/check.c                    | 87 ++++++++++++++++++
 src/bin/pg_upgrade/controldata.c              | 38 ++++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  2 +
 .../t/003_logical_replication_slots.pl        | 91 +++++++++++++++++--
 4 files changed, 210 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 4056b7a7a9..a013366280 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -31,6 +32,7 @@ static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -108,6 +110,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -1471,3 +1480,81 @@ check_new_cluster_logical_replication_slots(void)
 
 	check_ok();
 }
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking for logical replication slots");
+
+	/* Check there are no logical replication slots with a 'lost' state. */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	/*
+	 * Do additional checks if a live check is not required. This requires
+	 * that confirmed_flush_lsn of all the slots is the same as the latest
+	 * checkpoint location, but it would be satisfied only when the server has
+	 * been shut down.
+	 */
+	if (!live_check)
+	{
+		res = executeQueryOrDie(conn,
+								"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+								"WHERE confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+								LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+		ntups = PQntuples(res);
+		i_slotname = PQfnumber(res, "slot_name");
+
+		for (i = 0; i < ntups; i++)
+			pg_log(PG_WARNING,
+				   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				   PQgetvalue(res, i, i_slotname));
+
+		PQclear(res);
+
+		if (ntups)
+			pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+	}
+
+	PQfinish(conn);
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..808156ec09 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,44 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index dae92ef6c0..e72318f500 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -242,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index ae87c33708..640964c4e1 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -22,6 +22,10 @@ $old_publisher->init(allows_streaming => 'logical');
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
 my $bindir = $new_publisher->config_data('--bindir');
 
 # ------------------------------
@@ -65,13 +69,19 @@ $old_publisher->start;
 $old_publisher->safe_psql('postgres',
 	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
 );
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
 $old_publisher->stop;
 
-# 2. max_replication_slots is set to smaller than the number of slots (2)
+# 3. max_replication_slots is set to smaller than the number of slots (2)
 #	 present on the old cluster
 $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
 
-# 3. wal_level is set correctly on the new cluster
+# 4. wal_level is set correctly on the new cluster
 $new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
 
 # pg_upgrade will fail because the new cluster has insufficient max_replication_slots
@@ -95,7 +105,7 @@ ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
 rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
 
 # ------------------------------
-# TEST: Successful upgrade
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
 
 # Preparations for the subsequent test:
 # 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
@@ -106,10 +116,57 @@ $old_publisher->safe_psql('postgres',
 	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
 );
 
-# 2. Consume WAL records
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
 $old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)"
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
 );
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
 # Actual run, successful upgrade is expected
@@ -129,11 +186,29 @@ command_ok(
 ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ removed after pg_upgrade success");
 
-# Check that the slot 'test_slot1' has migrated to the new cluster
+# Check that the slot 'sub' has migrated to the new cluster
 $new_publisher->start;
 my $result = $new_publisher->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
-$new_publisher->stop;
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
 
 done_testing();
-- 
2.27.0

#176

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#175)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some minor review comments for patch v28-0002

======
src/sgml/ref/pgupgrade.sgml

1.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>

/on old standby/on the old standby/

======
src/bin/pg_upgrade/info.c

2. get_old_cluster_logical_slot_infos

+void
+get_old_cluster_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
+
+ pg_log(PG_VERBOSE, "\nsource databases:");
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+ get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}

It might be worth putting an Assert before calling the
get_old_cluster_logical_slot_infos_per_db(...) just as a sanity check:
Assert(pDbInfo->slot_arr.nslots == 0);

This also helps to better document the "Note" of the
count_old_cluster_logical_slots() function comment.

~~~

3. count_old_cluster_logical_slots

+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
+ * later.
+ */
+int
+count_old_cluster_logical_slots(void)

Maybe that "Note" should be expanded a bit to say who does this:

SUGGESTION

Note: This function always returns 0 if the old_cluster is PG16 and
prior because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for
PG17 and later. See where get_old_cluster_logical_slot_infos_per_db()
is called.

======
src/bin/pg_upgrade/pg_upgrade.c

4.
+ /*
+ * Logical replication slot upgrade only supported for old_cluster >=
+ * PG17.
+ *
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
+

4a.
I felt this comment needs a bit more detail otherwise you can't tell
how the >= PG17 version check works.

4b.
/slot upgrade only supported/slot upgrade is only supported/

SUGGESTION

Logical replication slot upgrade is only supported for old_cluster >=
PG17. An explicit version check is not necessary here because function
count_old_cluster_logical_slots() will always return 0 for old_cluster
<= PG16.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#177

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#175)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

Here are some review comments for v28-0003.

======
src/bin/pg_upgrade/check.c

1. check_and_dump_old_cluster
+ /*
+ * Logical replication slots can be migrated since PG17. See comments atop
+ * get_old_cluster_logical_slot_infos().
+ */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+ check_old_cluster_for_valid_slots(live_check);
+

IIUC we are preferring to use the <= 1600 style of version check
instead of >= 1700 where possible.

So this comment and version check ought to be removed from here, and
done inside check_old_cluster_for_valid_slots() instead.

~~~

2. check_old_cluster_for_valid_slots

+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ * - All logical replication slots are usable.
+ * - All logical replication slots consumed all WALs, except a
+ *   CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)

I suggested in the previous comment above (#1) that the version check
should be moved into this function.

Therefore, this function comment now should also mention slot upgrade
is only allowed for >= PG17

~~~

3.
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ int i,
+ ntups,
+ i_slotname;
+ PGresult   *res;
+ DbInfo    *active_db = &old_cluster.dbarr.dbs[0];
+ PGconn    *conn;
+
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

3a.
See comment #1. At the top of this function body there should be a
version check like:

if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
return;

3b.
/Quick exit/Quick return/

4.
+ prep_status("Checking for logical replication slots");

I felt that should add the word "valid" like:
"Checking for valid logical replication slots"

~~~

5.
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

Since the SQL is checking if there *are* lost slots I felt it would be
more natural to reverse that comment.

SUGGESTION
/* Check and reject if there are any logical replication slots with a
'lost' state. */

~~~

6.
+ /*
+ * Do additional checks if a live check is not required. This requires
+ * that confirmed_flush_lsn of all the slots is the same as the latest
+ * checkpoint location, but it would be satisfied only when the server has
+ * been shut down.
+ */
+ if (!live_check)

I think the comment can be rearranged slightly:

SUGGESTION
Do additional checks to ensure that 'confirmed_flush_lsn' of all the
slots is the same as the latest checkpoint location.
Note: This can be satisfied only when the old_cluster has been shut
down, so we skip this for "live" checks.

======
src/bin/pg_upgrade/controldata.c

7.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {

Fetching this "Latest checkpoint location:" value is only needed for
the check_old_cluster_for_valid_slots validation check, isn't it? But
AFAICT this code is common for both old_cluster and new_cluster.

I am not sure what is best to do:
- Do only the minimal logic needed?
- Read the value redundantly even for new_cluster just to keep code simpler?

Either way, maybe the comment should say something about this.

======
.../t/003_logical_replication_slots.pl

8. Consider adding one more test

Maybe there should also be some "live check" test performed (e.g.
using --check, and a running old_cluster).

This would demonstrate pg_upgrade working successfully even when the
WAL records are not consumed (because LSN checks would be skipped in
check_old_cluster_for_valid_slots function).

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#178

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#176)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 30, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some minor review comments for patch v28-0002

======
src/sgml/ref/pgupgrade.sgml

1.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
</para>

/on old standby/on the old standby/

Fixed.

======
src/bin/pg_upgrade/info.c

2. get_old_cluster_logical_slot_infos
+void
+get_old_cluster_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
+
+ pg_log(PG_VERBOSE, "\nsource databases:");
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+ get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
It might be worth putting an Assert before calling the
get_old_cluster_logical_slot_infos_per_db(...) just as a sanity check:
Assert(pDbInfo->slot_arr.nslots == 0);

This also helps to better document the "Note" of the
count_old_cluster_logical_slots() function comment.

I have changed the comments atop count_old_cluster_logical_slots() and
also I don't see the need for this Assert.

~~~

3. count_old_cluster_logical_slots
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
+ * later.
+ */
+int
+count_old_cluster_logical_slots(void)
Maybe that "Note" should be expanded a bit to say who does this:

SUGGESTION

Note: This function always returns 0 if the old_cluster is PG16 and
prior because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for
PG17 and later. See where get_old_cluster_logical_slot_infos_per_db()
is called.

Changed, but written differently because saying in terms of variable
name doesn't sound good to me.

======
src/bin/pg_upgrade/pg_upgrade.c
4.
+ /*
+ * Logical replication slot upgrade only supported for old_cluster >=
+ * PG17.
+ *
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
+
4a.
I felt this comment needs a bit more detail otherwise you can't tell
how the >= PG17 version check works.

4b.
/slot upgrade only supported/slot upgrade is only supported/

~

SUGGESTION

Logical replication slot upgrade is only supported for old_cluster >=
PG17. An explicit version check is not necessary here because function
count_old_cluster_logical_slots() will always return 0 for old_cluster
<= PG16.

I don't see the need to explain anything about version check here, so
removed that part of the comment.

Apart from this, I have addressed some of the comments raised by you
for the 0003 patch. Please find the diff patch attached. I think we
should combine 0002 and 0003 patches.

I have another comment on the patch:
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

In this place, shouldn't we explicitly check for slot_type as logical?
I think we should consistently check for slot_type in all the queries
used in this patch.

--
With Regards,
Amit Kapila.

Attachments:

changes_amit.1.patchapplication/octet-stream; name=changes_amit.1.patchDownload

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 07bb46f89c..bef107295c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -694,7 +694,7 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots on old standby are not copied.
+       with the primary.)  Replication slots on the old standby are not copied.
        Only logical slots on the primary are migrated to the new standby,
        and other slots must be recreated.
       </para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index a013366280..ea0fe88876 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1507,9 +1507,12 @@ check_old_cluster_for_valid_slots(bool live_check)
 
 	conn = connectToServer(&old_cluster, active_db->db_name);
 
-	prep_status("Checking for logical replication slots");
+	prep_status("Checking for valid logical replication slots");
 
-	/* Check there are no logical replication slots with a 'lost' state. */
+	/*
+	 * We don't allow to upgrade in the presence of lost slots as we can't
+	 * migrate those.
+	 */
 	res = executeQueryOrDie(conn,
 							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
 							"WHERE wal_status = 'lost' AND "
@@ -1529,10 +1532,11 @@ check_old_cluster_for_valid_slots(bool live_check)
 		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
 
 	/*
-	 * Do additional checks if a live check is not required. This requires
-	 * that confirmed_flush_lsn of all the slots is the same as the latest
-	 * checkpoint location, but it would be satisfied only when the server has
-	 * been shut down.
+	 * Do additional checks to ensure that confirmed_flush LSN of all the slots
+	 * is the same as the latest checkpoint location.
+	 *
+	 * Note: This can be satisfied only when the old cluster has been shut
+	 * down, so we skip this live checks.
 	 */
 	if (!live_check)
 	{
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 808156ec09..11881db84c 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -175,7 +175,8 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				/*
 				 * Read the latest checkpoint location if the cluster is PG17
 				 * or later. This is used for upgrading logical replication
-				 * slots.
+				 * slots. Currently, we need it only for the old cluster but
+				 * didn't add additional check for the similicity.
 				 */
 				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
 				{
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index 5b4b414a91..71e3ec7a51 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -47,11 +47,8 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries:
- *	1. Name of library files containing C-language functions (for non-built-in
- *	   functions), and
- *	2. Shared object (library) names containing the logical replication output
- *	   plugins
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
  *
  *	We will later check that they all exist in the new installation.
  */
@@ -66,7 +63,7 @@ get_loadable_libraries(void)
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
-	/* Construct a query string for fetching non-built-in C functions */
+	/* distinct libraries for non-built-in C functions */
 	appendPQExpBuffer(query, "SELECT DISTINCT probin "
 					  "FROM pg_catalog.pg_proc "
 					  "WHERE prolang = %u AND "
@@ -75,10 +72,7 @@ get_loadable_libraries(void)
 					  ClanguageId,
 					  FirstNormalObjectId);
 
-	/*
-	 * If old_cluster is PG 17 or later, logical decoding output plugins must
-	 * also be included.
-	 */
+	/* upgrade of logical slots are supported since PG 17 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
 		appendPQExpBufferStr(query, " UNION "
 							 "SELECT DISTINCT plugin "
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c905e02d45..e7e92f17ec 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -661,8 +661,10 @@ get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
  * Higher level routine to generate LogicalSlotInfoArr for all databases.
  *
  * Note: This function will not do anything if the old cluster is pre-PG17.
- * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
- * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks done in
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
  * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
  * are included.
  */
@@ -697,8 +699,8 @@ get_old_cluster_logical_slot_infos(void)
  * Sum up and return the number of logical replication slots for all databases.
  *
  * Note: this function always returns 0 if the old_cluster is PG16 and prior
- * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
- * later.
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
  */
 int
 count_old_cluster_logical_slots(void)
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index c4bf12fd6b..b267b484b1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -190,9 +190,6 @@ main(int argc, char **argv)
 	check_ok();
 
 	/*
-	 * Logical replication slot upgrade only supported for old_cluster >=
-	 * PG17.
-	 *
 	 * Note: This must be done after doing the pg_resetwal command because
 	 * pg_resetwal would remove required WALs.
 	 */

#179

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#177)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Aug 30, 2023 at 10:58 AM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some review comments for v28-0003.

======
src/bin/pg_upgrade/check.c
1. check_and_dump_old_cluster
+ /*
+ * Logical replication slots can be migrated since PG17. See comments atop
+ * get_old_cluster_logical_slot_infos().
+ */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+ check_old_cluster_for_valid_slots(live_check);
+
IIUC we are preferring to use the <= 1600 style of version check
instead of >= 1700 where possible.

Yeah, but in this case, following the nearby code style, I think it is
okay to keep it as it is.

~

3b.
/Quick exit/Quick return/

Hmm, either way should be okay.

~

4.
+ prep_status("Checking for logical replication slots");

I felt that should add the word "valid" like:
"Checking for valid logical replication slots"

Agreed and fixed.

~~~
5.
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");
Since the SQL is checking if there *are* lost slots I felt it would be
more natural to reverse that comment.

SUGGESTION
/* Check and reject if there are any logical replication slots with a
'lost' state. */

I changed the comments but differently.

~~~
6.
+ /*
+ * Do additional checks if a live check is not required. This requires
+ * that confirmed_flush_lsn of all the slots is the same as the latest
+ * checkpoint location, but it would be satisfied only when the server has
+ * been shut down.
+ */
+ if (!live_check)
I think the comment can be rearranged slightly:

SUGGESTION
Do additional checks to ensure that 'confirmed_flush_lsn' of all the
slots is the same as the latest checkpoint location.
Note: This can be satisfied only when the old_cluster has been shut
down, so we skip this for "live" checks.

Changed as per suggestion.

======
src/bin/pg_upgrade/controldata.c
7.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {
Fetching this "Latest checkpoint location:" value is only needed for
the check_old_cluster_for_valid_slots validation check, isn't it? But
AFAICT this code is common for both old_cluster and new_cluster.

I am not sure what is best to do:
- Do only the minimal logic needed?
- Read the value redundantly even for new_cluster just to keep code simpler?

Either way, maybe the comment should say something about this.

Added the comment.

--
With Regards,
Amit Kapila.

#180

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#175)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Aug 29, 2023 at 5:28 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Some comments in 0002

1.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");

What is the reason we are ignoring temporary slots here? I think we
better explain in the comments.

2.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slots but
found \"%s\"",
+ PQgetvalue(res, 0, 0));

It looks a bit odd to me that first it is fetching all the logical
slots from the new cluster and then printing the name of one of the
slots. If it is printing the name of the slots then shouldn't it be
printing all the slots' names or it should just say that there
existing slots on the new cluster without giving any names? And if we
are planning for option 2 i.e. not printing the name then better to
put LIMIT 1 at the end of the query.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#181

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Dilip Kumar (#180)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Dilip,

Thanks for giving comments!

Some comments in 0002
1.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
What is the reason we are ignoring temporary slots here? I think we
better explain in the comments.

The temporary slots were expressly ignored while checking because such slots
cannot exist after the upgrade. Before doing pg_upgrade, both old and new cluster
must be turned off, and they start/stop several times during the upgrade.

How do you think?

2.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slots but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
It looks a bit odd to me that first it is fetching all the logical
slots from the new cluster and then printing the name of one of the
slots. If it is printing the name of the slots then shouldn't it be
printing all the slots' names or it should just say that there
existing slots on the new cluster without giving any names? And if we
are planning for option 2 i.e. not printing the name then better to
put LIMIT 1 at the end of the query.

I'm planning to change that the number of slots are reported by using count(*).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#182

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#181)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Aug 31, 2023 at 7:56 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thanks for giving comments!

Thanks

Some comments in 0002
1.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
What is the reason we are ignoring temporary slots here? I think we
better explain in the comments.
The temporary slots were expressly ignored while checking because such slots
cannot exist after the upgrade. Before doing pg_upgrade, both old and new cluster
must be turned off, and they start/stop several times during the upgrade.

How do you think?

LGTM

2.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slots but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
It looks a bit odd to me that first it is fetching all the logical
slots from the new cluster and then printing the name of one of the
slots. If it is printing the name of the slots then shouldn't it be
printing all the slots' names or it should just say that there
existing slots on the new cluster without giving any names? And if we
are planning for option 2 i.e. not printing the name then better to
put LIMIT 1 at the end of the query.
I'm planning to change that the number of slots are reported by using count(*).

Yeah, that seems a better option.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#183

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#177)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for giving comments! PSA new version.
I replied only comment 8 because others were replied by Amit.

.../t/003_logical_replication_slots.pl

8. Consider adding one more test

Maybe there should also be some "live check" test performed (e.g.
using --check, and a running old_cluster).

This would demonstrate pg_upgrade working successfully even when the
WAL records are not consumed (because LSN checks would be skipped in
check_old_cluster_for_valid_slots function).

I was ignored the case because it did not improve improve code coverage, but
indeed, no one has checked the feature. I'm still not sure what should be, but
added. I want to hear your opinions.

Furthermore, based on comments from Dilip [1]/messages/by-id/CAFiTN-tgm9wCTyG4co+VZhyFTnzh-KoPtYbuH9bRFmxroJ34EQ@mail.gmail.com, added the comment and
check_new_cluster_logical_replication_slots() was modified. IIUC pg_upgrade
does not have method to handle plural form, so if-statement was used.
If you have better options, please tell me.

[1]: /messages/by-id/CAFiTN-tgm9wCTyG4co+VZhyFTnzh-KoPtYbuH9bRFmxroJ34EQ@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v29-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v29-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From 7d23c302aaff6ad5034ee1ee4f668de6352d865e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v29 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  29 +++--
 src/include/replication/slot.h                |  13 ++-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 101 ++++++++++++++++++
 5 files changed, 133 insertions(+), 13 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..c075f76317 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,22 +1734,26 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
 	int			fd;
 	ReplicationSlotOnDisk cp;
 	bool		was_dirty;
+	bool		confirmed_flush_has_changed;
 
 	/* first check whether there's something to write out */
 	SpinLockAcquire(&slot->mutex);
 	was_dirty = slot->dirty;
 	slot->just_dirtied = false;
+	confirmed_flush_has_changed = (slot->data.confirmed_flush != slot->last_saved_confirmed_flush);
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/* Don't do anything if there's nothing to write. See ReplicationSlot. */
+	if (!was_dirty &&
+		!(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1873,11 +1878,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2080,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..448fb8cf51 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,17 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * We won't ensure that the slot is persisted after the confirmed_flush
+	 * LSN is updated as that could lead to frequent writes.  However, we need
+	 * to ensure that we do persist the slots at the time of shutdown whose
+	 * confirmed_flush LSN is changed since we last saved the slot to disk.
+	 * This will help in avoiding retreat of the confirmed_flush LSN after
+	 * restart.  This variable is used to track the last saved confirmed_flush
+	 * LSN value.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +252,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..6e114e9b29
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,101 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v29-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v29-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From a554c2fa8b85654b435718d81453d2068ec2e155 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v29 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 +++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 170 +++++++++++++
 src/bin/pg_upgrade/controldata.c              |  39 +++
 src/bin/pg_upgrade/function.c                 |  41 ++-
 src/bin/pg_upgrade/info.c                     | 144 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  71 ++++++
 src/bin/pg_upgrade/pg_upgrade.h               |  21 ++
 .../t/003_logical_replication_slots.pl        | 238 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 11 files changed, 786 insertions(+), 15 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..bef107295c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -629,8 +694,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..e303e48587 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -30,6 +31,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,6 +92,9 @@ check_and_dump_old_cluster(bool live_check)
 	/* Extract a list of databases and tables from the old cluster */
 	get_db_and_rel_infos(&old_cluster);
 
+	/* Extract a list of logical replication slots */
+	get_old_cluster_logical_slot_infos();
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -104,6 +110,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -189,6 +202,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -1402,3 +1417,158 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			nlost_slots;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	nlost_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nlost_slots)
+	{
+		if (nlost_slots == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots",
+					nlost_slots);
+	}
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick exit if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking for valid logical replication slots");
+
+	/*
+	 * We don't allow to upgrade in the presence of lost slots as we can't
+	 * migrate those.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	/*
+	 * Do additional checks to ensure that confirmed_flush LSN of all the slots
+	 * is the same as the latest checkpoint location.
+	 *
+	 * Note: This can be satisfied only when the old cluster has been shut
+	 * down, so we skip this live checks.
+	 */
+	if (!live_check)
+	{
+		res = executeQueryOrDie(conn,
+								"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+								"WHERE slot_type = 'logical' AND "
+								"confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+								LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+		ntups = PQntuples(res);
+		i_slotname = PQfnumber(res, "slot_name");
+
+		for (i = 0; i < ntups; i++)
+			pg_log(PG_WARNING,
+				   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				   PQgetvalue(res, i, i_slotname));
+
+		PQclear(res);
+
+		if (ntups)
+			pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+	}
+
+	PQfinish(conn);
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..11881db84c 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * didn't add additional check for the similicity.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..b00e6af6a9 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,32 +58,46 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	PQExpBuffer query = createPQExpBuffer();
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
+	/* distinct libraries for non-built-in C functions */
+	appendPQExpBuffer(query, "SELECT DISTINCT probin "
+					  "FROM pg_catalog.pg_proc "
+					  "WHERE prolang = %u AND "
+					  "probin IS NOT NULL AND "
+					  "oid >= %u",
+					  ClanguageId,
+					  FirstNormalObjectId);
+
+	/* upgrade of logical slots are supported since PG 17 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		appendPQExpBufferStr(query, " UNION "
+							 "SELECT DISTINCT plugin "
+							 "FROM pg_catalog.pg_replication_slots "
+							 "WHERE slot_type = 'logical' AND "
+							 "wal_status <> 'lost' AND "
+							 "database = current_database() AND "
+							 "temporary IS FALSE;");
+
 	/* Fetch all library names, removing duplicates within each DB */
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		DbInfo	   *active_db = &old_cluster.dbarr.dbs[dbnum];
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
-		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
-		 */
-		ress[dbnum] = executeQueryOrDie(conn,
-										"SELECT DISTINCT probin "
-										"FROM pg_catalog.pg_proc "
-										"WHERE prolang = %u AND "
-										"probin IS NOT NULL AND "
-										"oid >= %u;",
-										ClanguageId,
-										FirstNormalObjectId);
+		/* Extract a list of libraries */
+		ress[dbnum] = executeQueryOrDie(conn, "%s", query->data);
+
 		totaltups += PQntuples(ress[dbnum]);
 
 		PQfinish(conn);
 	}
 
+	destroyPQExpBuffer(query);
+
 	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
 	totaltups = 0;
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..6944b1cd41 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,7 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 
 
 /*
@@ -394,7 +395,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +601,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos_per_db()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the database
+ * referred to by "dbinfo".
+ */
+static void
+get_old_cluster_logical_slot_infos_per_db(DbInfo *dbinfo)
+{
+	PGconn	   *conn = connectToServer(&old_cluster,
+									   dbinfo->db_name);
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+
+	int			num_slots;
+
+	/*
+	 * The temporary slots are expressly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times so that temporary slots will be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+void
+get_old_cluster_logical_slot_infos(void)
+{
+	int			dbnum;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	pg_log(PG_VERBOSE, "\nsource databases:");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
+
+		get_old_cluster_logical_slot_infos_per_db(pDbInfo);
+
+		if (log_opts.verbose)
+		{
+			pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+			print_slot_infos(&pDbInfo->slot_arr);
+		}
+	}
+}
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +730,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -660,3 +786,19 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..b267b484b1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,17 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -860,3 +872,62 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..e72318f500 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -400,6 +419,8 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_old_cluster_logical_slot_infos(void);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..7c6cc6d04b
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,238 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful --check command
+
+# Preparations for the subsequent test:
+# 1. Start the cluster. --check works well when an old cluster is running.
+$old_publisher->start;
+
+# Actual run, successful --check command is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok(!-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Remove the remained slot
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#184

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#178)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving suggestions! I think your fixes are good.
New patch set can be available in [1]/messages/by-id/TYAPR01MB5866CDC13CA9D6B9F4451606F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Apart from this, I have addressed some of the comments raised by you
for the 0003 patch. Please find the diff patch attached. I think we
should combine 0002 and 0003 patches.

Yeah, combined.

I have another comment on the patch:
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");
In this place, shouldn't we explicitly check for slot_type as logical?
I think we should consistently check for slot_type in all the queries
used in this patch.

Seems right, the condition was added to all the place.

[1]: /messages/by-id/TYAPR01MB5866CDC13CA9D6B9F4451606F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#185

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#182)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 1, 2023 at 9:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Aug 31, 2023 at 7:56 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Some more comments on 0002

1.
+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");

I think we should add some comment saying this query will only fetch
logical slots because the database name will always be NULL in the
physical slots. Otherwise looking at the query it is very confusing
how it is avoiding the physical slots.

2.
+void
+get_old_cluster_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
+
+ pg_log(PG_VERBOSE, "\nsource databases:");

I think we need to change some headings like "slot info source
databases:" Or add an extra message saying printing slot information.

Before this patch, we were printing all the relation information so
message ordering was quite clear e.g.

source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""

But after this patch slot information is also getting printed in a
similar fashion so it's very confusing now. Refer
get_db_and_rel_infos() for how it is fetching all the relation
information first and then printing them.

3. One more problem is that the slot information and the execute query
messages are intermingled so it becomes more confusing, see the below
example of the latest messaging. I think ideally we should execute
these queries first
and then print all slot information together instead of intermingling
the messages.

source databases:
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "template1"
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "postgres"
slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0

4. Looking at the above two comments I feel that now the order should be like
- Fetch all the db infos
get_db_infos()
- loop
get_rel_infos()
get_old_cluster_logical_slot_infos()

-- and now print relation and slot information per database
print_db_infos()

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#186

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#183)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 1, 2023 at 10:16 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

+ /*
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }

Can we combine this code with the code in the function
issue_warnings_and_set_wal_level()? That will avoid starting/stopping
the server for creating slots.

--
With Regards,
Amit Kapila.

#187

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#183)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v29-0002

======
src/bin/pg_upgrade/check.c

1. check_old_cluster_for_valid_slots

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

/Quick exit/Quick return/

I know they are kind of the same, but the reason I previously
suggested this change was to keep it consistent with the similar
comment that is already in
check_new_cluster_logical_replication_slots().

~~~

2. check_old_cluster_for_valid_slots

+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all the slots
+ * is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been shut
+ * down, so we skip this live checks.
+ */
+ if (!live_check)

missing word

/skip this live checks./skip this for live checks./

======
src/bin/pg_upgrade/controldata.c

3.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots. Currently, we need it only for the old cluster but
+ * didn't add additional check for the similicity.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)

/similicity/simplicity/

SUGGESTION
Currently, we need it only for the old cluster but for simplicity
chose not to have additional checks.

======
src/bin/pg_upgrade/info.c

4. get_old_cluster_logical_slot_infos_per_db

+ /*
+ * The temporary slots are expressly ignored while checking because such
+ * slots cannot exist after the upgrade. During the upgrade, clusters are
+ * started and stopped several times so that temporary slots will be
+ * removed.
+ */
+ res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;");

IIUC, the removal of temp slots is just a side-effect of the
start/stop; not the *reason* for the start/stop. So, the last sentence
needs some modification

BEFORE
During the upgrade, clusters are started and stopped several times so
that temporary slots will be removed.

SUGGESTION
During the upgrade, clusters are started and stopped several times
causing any temporary slots to be removed.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#188

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Dilip Kumar (#185)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Dilip,

Thank you for reviewing!

1.
+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
I think we should add some comment saying this query will only fetch
logical slots because the database name will always be NULL in the
physical slots. Otherwise looking at the query it is very confusing
how it is avoiding the physical slots.

Hmm, the query you pointed out does not check the database of the slot...
We are fetching only logical slots by the condition "slot_type = 'logical'",
I think it is too trivial to describe in the comment.
Just to confirm - pg_replication_slots can see alls the slots even if the database
is not current one.

```
tmp=# SELECT slot_name, slot_type, database FROM pg_replication_slots where database != current_database();
slot_name | slot_type | database
-----------+-----------+----------
test | logical | postgres
(1 row)
```

If I misunderstood something, please tell me...

2.
+void
+get_old_cluster_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
+
+ pg_log(PG_VERBOSE, "\nsource databases:");
I think we need to change some headings like "slot info source
databases:" Or add an extra message saying printing slot information.

Before this patch, we were printing all the relation information so
message ordering was quite clear e.g.

source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""

But after this patch slot information is also getting printed in a
similar fashion so it's very confusing now. Refer
get_db_and_rel_infos() for how it is fetching all the relation
information first and then printing them.

3. One more problem is that the slot information and the execute query
messages are intermingled so it becomes more confusing, see the below
example of the latest messaging. I think ideally we should execute
these queries first
and then print all slot information together instead of intermingling
the messages.

source databases:
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "template1"
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "postgres"
slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0

4. Looking at the above two comments I feel that now the order should be like
- Fetch all the db infos
get_db_infos()
- loop
get_rel_infos()
get_old_cluster_logical_slot_infos()

-- and now print relation and slot information per database
print_db_infos()

Fixed like that. It seems that we go back to old style...
Now the debug prints are like below:

```
source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Logical replication slots within the database:
slotname: "old1", plugin: "test_decoding", two_phase: 0
slotname: "old2", plugin: "test_decoding", two_phase: 0
slotname: "old3", plugin: "test_decoding", two_phase: 0
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From a15a35385e6fb9d03e3e22935dfc9045a91031f5 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v30 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 +++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 188 ++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 +++
 src/bin/pg_upgrade/function.c                 |  41 ++-
 src/bin/pg_upgrade/info.c                     | 143 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 109 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  21 +-
 .../t/003_logical_replication_slots.pl        | 238 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 11 files changed, 812 insertions(+), 44 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..bef107295c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -629,8 +694,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..ab723d3526 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -30,6 +31,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -104,6 +107,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -189,6 +199,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -232,27 +244,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1393,158 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			nlost_slots;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	nlost_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nlost_slots)
+	{
+		if (nlost_slots == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots",
+					nlost_slots);
+	}
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			i,
+				ntups,
+				i_slotname;
+	PGresult   *res;
+	DbInfo	   *active_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	/* Quick return if the cluster does not have logical slots. */
+	if (count_old_cluster_logical_slots() == 0)
+		return;
+
+	conn = connectToServer(&old_cluster, active_db->db_name);
+
+	prep_status("Checking for valid logical replication slots");
+
+	/*
+	 * We don't allow to upgrade in the presence of lost slots as we can't
+	 * migrate those.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"wal_status = 'lost' AND "
+							"temporary IS FALSE;");
+
+	ntups = PQntuples(res);
+	i_slotname = PQfnumber(res, "slot_name");
+
+	for (i = 0; i < ntups; i++)
+		pg_log(PG_WARNING,
+			   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+			   PQgetvalue(res, i, i_slotname));
+
+	PQclear(res);
+
+	if (ntups)
+		pg_fatal("One or more logical replication slots with a state of 'lost' were detected.");
+
+	/*
+	 * Do additional checks to ensure that confirmed_flush LSN of all the slots
+	 * is the same as the latest checkpoint location.
+	 *
+	 * Note: This can be satisfied only when the old cluster has been shut
+	 * down, so we skip this for live checks.
+	 */
+	if (!live_check)
+	{
+		res = executeQueryOrDie(conn,
+								"SELECT slot_name FROM pg_catalog.pg_replication_slots "
+								"WHERE slot_type = 'logical' AND "
+								"confirmed_flush_lsn != '%X/%X' AND temporary IS FALSE;",
+								LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+		ntups = PQntuples(res);
+		i_slotname = PQfnumber(res, "slot_name");
+
+		for (i = 0; i < ntups; i++)
+			pg_log(PG_WARNING,
+				   "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+				   PQgetvalue(res, i, i_slotname));
+
+		PQclear(res);
+
+		if (ntups)
+			pg_fatal("One or more logical replication slots still have unconsumed WAL records.");
+	}
+
+	PQfinish(conn);
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..b3e87ae68f 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,32 +58,46 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	PQExpBuffer query = createPQExpBuffer();
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
+	/* distinct libraries for non-built-in C functions */
+	appendPQExpBuffer(query, "SELECT DISTINCT probin "
+					  "FROM pg_catalog.pg_proc "
+					  "WHERE prolang = %u AND "
+					  "probin IS NOT NULL AND "
+					  "oid >= %u",
+					  ClanguageId,
+					  FirstNormalObjectId);
+
+	/* upgrade of logical slots is supported since PG 17 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		appendPQExpBufferStr(query, " UNION "
+							 "SELECT DISTINCT plugin "
+							 "FROM pg_catalog.pg_replication_slots "
+							 "WHERE slot_type = 'logical' AND "
+							 "wal_status <> 'lost' AND "
+							 "database = current_database() AND "
+							 "temporary IS FALSE;");
+
 	/* Fetch all library names, removing duplicates within each DB */
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		DbInfo	   *active_db = &old_cluster.dbarr.dbs[dbnum];
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
 
-		/*
-		 * Fetch all libraries containing non-built-in C functions in this DB.
-		 */
-		ress[dbnum] = executeQueryOrDie(conn,
-										"SELECT DISTINCT probin "
-										"FROM pg_catalog.pg_proc "
-										"WHERE prolang = %u AND "
-										"probin IS NOT NULL AND "
-										"oid >= %u;",
-										ClanguageId,
-										FirstNormalObjectId);
+		/* Extract a list of libraries */
+		ress[dbnum] = executeQueryOrDie(conn, "%s", query->data);
+
 		totaltups += PQntuples(ress[dbnum]);
 
 		PQfinish(conn);
 	}
 
+	destroyPQExpBuffer(query);
+
 	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
 	totaltups = 0;
 
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..6817c4517e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -283,7 +285,18 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * If we are reading the old_cluster, gets infos for logical
+		 * replication slots.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -394,7 +407,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +613,100 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are expressly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"wal_status <> 'lost' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +717,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,22 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		if (slotnum == 0)
+			pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..f81b1d5cc8 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,10 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	create_script_for_old_cluster_deletion(&deletion_script_file_name);
+
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -197,10 +203,6 @@ main(int argc, char **argv)
 		check_ok();
 	}
 
-	create_script_for_old_cluster_deletion(&deletion_script_file_name);
-
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -860,3 +862,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control fine, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..32a368c08b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,22 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +193,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +243,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -344,7 +363,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,6 +418,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..7c6cc6d04b
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,238 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful --check command
+
+# Preparations for the subsequent test:
+# 1. Start the cluster. --check works well when an old cluster is running.
+$old_publisher->start;
+
+# Actual run, successful --check command is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok(!-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Remove the remained slot
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..310456e032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From 7d23c302aaff6ad5034ee1ee4f668de6352d865e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v30 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  29 +++--
 src/include/replication/slot.h                |  13 ++-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 101 ++++++++++++++++++
 5 files changed, 133 insertions(+), 13 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..c075f76317 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,22 +1734,26 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
 	int			fd;
 	ReplicationSlotOnDisk cp;
 	bool		was_dirty;
+	bool		confirmed_flush_has_changed;
 
 	/* first check whether there's something to write out */
 	SpinLockAcquire(&slot->mutex);
 	was_dirty = slot->dirty;
 	slot->just_dirtied = false;
+	confirmed_flush_has_changed = (slot->data.confirmed_flush != slot->last_saved_confirmed_flush);
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/* Don't do anything if there's nothing to write. See ReplicationSlot. */
+	if (!was_dirty &&
+		!(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1873,11 +1878,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2080,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..448fb8cf51 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,17 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * We won't ensure that the slot is persisted after the confirmed_flush
+	 * LSN is updated as that could lead to frequent writes.  However, we need
+	 * to ensure that we do persist the slots at the time of shutdown whose
+	 * confirmed_flush LSN is changed since we last saved the slot to disk.
+	 * This will help in avoiding retreat of the confirmed_flush LSN after
+	 * restart.  This variable is used to track the last saved confirmed_flush
+	 * LSN value.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +252,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..6e114e9b29
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,101 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

#189

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#187)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! New patch can be available in [1]/messages/by-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

======
src/bin/pg_upgrade/check.c

1. check_old_cluster_for_valid_slots
+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;
/Quick exit/Quick return/

I know they are kind of the same, but the reason I previously
suggested this change was to keep it consistent with the similar
comment that is already in
check_new_cluster_logical_replication_slots().

Fixed.

2. check_old_cluster_for_valid_slots

+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all the slots
+ * is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been shut
+ * down, so we skip this live checks.
+ */
+ if (!live_check)

missing word

/skip this live checks./skip this for live checks./

Fixed.

src/bin/pg_upgrade/controldata.c
3.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots. Currently, we need it only for the old cluster but
+ * didn't add additional check for the similicity.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
/similicity/simplicity/

SUGGESTION
Currently, we need it only for the old cluster but for simplicity
chose not to have additional checks.

Fixed.

src/bin/pg_upgrade/info.c

4. get_old_cluster_logical_slot_infos_per_db
+ /*
+ * The temporary slots are expressly ignored while checking because such
+ * slots cannot exist after the upgrade. During the upgrade, clusters are
+ * started and stopped several times so that temporary slots will be
+ * removed.
+ */
+ res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;");
IIUC, the removal of temp slots is just a side-effect of the
start/stop; not the *reason* for the start/stop. So, the last sentence
needs some modification

BEFORE
During the upgrade, clusters are started and stopped several times so
that temporary slots will be removed.

SUGGESTION
During the upgrade, clusters are started and stopped several times
causing any temporary slots to be removed.

Fixed.

[1]: /messages/by-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#190

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#186)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! New patch can be available in [1]/messages/by-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

+ /*
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ {
+ start_postmaster(&new_cluster, true);
+ create_logical_replication_slots();
+ stop_postmaster(false);
+ }
Can we combine this code with the code in the function
issue_warnings_and_set_wal_level()? That will avoid starting/stopping
the server for creating slots.

Yeah, I can. But create_logical_replication_slots() must be done before doing
"initdb --sync-only", so they put before that.
The name is setup_new_cluster().

[1]: /messages/by-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#191

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#188)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 1, 2023 at 6:34 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Dilip,

Thank you for reviewing!
1.
+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
I think we should add some comment saying this query will only fetch
logical slots because the database name will always be NULL in the
physical slots. Otherwise looking at the query it is very confusing
how it is avoiding the physical slots.
Hmm, the query you pointed out does not check the database of the slot...
We are fetching only logical slots by the condition "slot_type = 'logical'",
I think it is too trivial to describe in the comment.
Just to confirm - pg_replication_slots can see alls the slots even if the database
is not current one.

I think this is fine. Actually I posted comments based on v28 where
the query inside get_old_cluster_logical_slot_infos_per_db() function
was missing the condition on the slot_type = logical but while
commenting I quoted the wrong hunk from the code. Anyway the other
part of the code which I intended is also fixed from v29 so all good.
Thanks :)

4. Looking at the above two comments I feel that now the order should be like
- Fetch all the db infos
get_db_infos()
- loop
get_rel_infos()
get_old_cluster_logical_slot_infos()

-- and now print relation and slot information per database
print_db_infos()

Fixed like that. It seems that we go back to old style...
Now the debug prints are like below:

```
source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Logical replication slots within the database:
slotname: "old1", plugin: "test_decoding", two_phase: 0
slotname: "old2", plugin: "test_decoding", two_phase: 0
slotname: "old3", plugin: "test_decoding", two_phase: 0

Yeah this looks good now.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#192

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#188)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, September 1, 2023 9:05 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Hi,

Thanks for updating the patch.
I have a comment about the check related to the wal_status.

Currently, there are few places where we check the wal_status of slots. e.g.
check_old_cluster_for_valid_slots(),get_loadable_libraries(), and
get_old_cluster_logical_slot_infos().

But as discussed in another thread[1]/messages/by-id/CAA4eK1LLik2818uzYqS73O+He5LK_+=kthyZ6hwT6oe9TuxycA@mail.gmail.com. There are some kind of WALs that will be
written when pg_upgrade are checking the old cluster which could cause the wal
size to exceed the max_slot_wal_keep_size. In this case, checkpoint will remove
the wals required by slots and invalidate these slots(the wal_status get
changed as well).

Based on this, it’s possible that the slots we get each time when checking
wal_status are different, because they may get changed in between these checks.
This may not cause serious problems for now, because we will either copy all
the slots including ones invalidated when upgrading or we report ERROR. But I
feel it's better to get consistent result each time we check the slots to close
the possibility for problems in the future. So, I feel we could centralize the
check for wal_status and slots fetch, so that even if some slots status changed
after that, it won't have a risk to affect our check. What do you think ?

[1]: /messages/by-id/CAA4eK1LLik2818uzYqS73O+He5LK_+=kthyZ6hwT6oe9TuxycA@mail.gmail.com

Best Regards,
Hou zj

#193

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#192)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou-san,

Based on this, it’s possible that the slots we get each time when checking
wal_status are different, because they may get changed in between these checks.
This may not cause serious problems for now, because we will either copy all
the slots including ones invalidated when upgrading or we report ERROR. But I
feel it's better to get consistent result each time we check the slots to close
the possibility for problems in the future. So, I feel we could centralize the
check for wal_status and slots fetch, so that even if some slots status changed
after that, it won't have a risk to affect our check. What do you think ?

Thank you for giving the suggestion! I agreed that to centralize checks, and I
had already started to modify. Here is the updated patch.

In this patch all slot infos are extracted in the get_old_cluster_logical_slot_infos(),
upcoming functions uses them. Based on the change, two attributes confirmed_flush
and wal_status were added in LogicalSlotInfo.

IIUC we cannot use strcut List in the client codes, so structures and related
functions are added in the function.c. These are used for extracting unique
plugins, but it may be overkill because check_loadable_libraries() handle
duplicated entries. If we can ignore duplicated entries, these functions can be
removed.

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v31-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v31-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From 5290e29ce69c48a82d19f3c3f6a4f002f46b570b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v31 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  29 +++--
 src/include/replication/slot.h                |  13 ++-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 101 ++++++++++++++++++
 5 files changed, 133 insertions(+), 13 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..c075f76317 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+						   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPath(s, path, LOG, is_shutdown);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
@@ -1707,7 +1708,7 @@ CreateSlotOnDisk(ReplicationSlot *slot)
 
 	/* Write the actual state file. */
 	slot->dirty = true;			/* signal that we really need to write */
-	SaveSlotToPath(slot, tmppath, ERROR);
+	SaveSlotToPath(slot, tmppath, ERROR, false);
 
 	/* Rename the directory into place. */
 	if (rename(tmppath, path) != 0)
@@ -1733,22 +1734,26 @@ CreateSlotOnDisk(ReplicationSlot *slot)
  * Shared functionality between saving and creating a replication slot.
  */
 static void
-SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
+SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+			   bool is_shutdown)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
 	int			fd;
 	ReplicationSlotOnDisk cp;
 	bool		was_dirty;
+	bool		confirmed_flush_has_changed;
 
 	/* first check whether there's something to write out */
 	SpinLockAcquire(&slot->mutex);
 	was_dirty = slot->dirty;
 	slot->just_dirtied = false;
+	confirmed_flush_has_changed = (slot->data.confirmed_flush != slot->last_saved_confirmed_flush);
 	SpinLockRelease(&slot->mutex);
 
-	/* and don't do anything if there's nothing to write */
-	if (!was_dirty)
+	/* Don't do anything if there's nothing to write. See ReplicationSlot. */
+	if (!was_dirty &&
+		!(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))
 		return;
 
 	LWLockAcquire(&slot->io_in_progress_lock, LW_EXCLUSIVE);
@@ -1873,11 +1878,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = slot->data.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2080,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..448fb8cf51 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,17 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * We won't ensure that the slot is persisted after the confirmed_flush
+	 * LSN is updated as that could lead to frequent writes.  However, we need
+	 * to ensure that we do persist the slots at the time of shutdown whose
+	 * confirmed_flush LSN is changed since we last saved the slot to disk.
+	 * This will help in avoiding retreat of the confirmed_flush LSN after
+	 * restart.  This variable is used to track the last saved confirmed_flush
+	 * LSN value.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +252,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..6e114e9b29
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,101 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v31-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v31-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 233e24f097d18de6e38b9eb6e2b971b144f205d3 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v31 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 +++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 152 +++++++++--
 src/bin/pg_upgrade/controldata.c              |  72 ++++++
 src/bin/pg_upgrade/function.c                 | 149 ++++++++++-
 src/bin/pg_upgrade/info.c                     | 178 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 109 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  25 +-
 .../t/003_logical_replication_slots.pl        | 238 ++++++++++++++++++
 src/include/access/xlog.h                     |  13 -
 src/include/access/xlogdefs.h                 |  13 +
 src/tools/pgindent/typedefs.list              |   5 +
 13 files changed, 981 insertions(+), 47 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..bef107295c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -629,8 +694,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..490196b616 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -30,6 +31,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -104,6 +107,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -189,6 +199,8 @@ check_new_cluster(void)
 {
 	get_db_and_rel_infos(&new_cluster);
 
+	check_new_cluster_logical_replication_slots();
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -232,27 +244,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1393,122 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots = 0;
+	int			nlost_slots;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	nlost_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nlost_slots)
+	{
+		if (nlost_slots == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots",
+					 nlost_slots);
+	}
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+
+	prep_status("Checking for valid logical replication slots");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot still usable? */
+			if (slot->wal_status == WALAVAIL_REMOVED)
+				pg_fatal("Logical replication slot \"%s\" is in 'lost' state.",
+						 slot->slotname);
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check &&
+				(slot->confirmed_flush != old_cluster.controldata.chkpnt_latest))
+				pg_fatal("logical replication slot \"%s\" has not consumed WALs yet",
+						 slot->slotname);
+		}
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..fc08d55244 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,38 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					bool		have_error = false;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					cluster->controldata.chkpnt_latest =
+						strtoLSN(p, &have_error);
+
+					if (have_error)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+				}
+			}
 		}
 
 		rc = pclose(output);
@@ -732,3 +764,43 @@ disable_old_cluster(void)
 		   "started once the new cluster has been started.",
 		   old_cluster.pgdata);
 }
+
+/*
+ * Convert String to XLogRecPtr.
+ *
+ * This function is ported from pg_lsn_in_internal(). The function cannot be
+ * called from client binaries.
+ */
+XLogRecPtr
+strtoLSN(const char *str, bool *have_error)
+{
+	int			len1,
+				len2;
+	uint32		id,
+				off;
+	XLogRecPtr	result;
+
+	Assert(have_error != NULL);
+	*have_error = false;
+
+	/* Sanity check input format. */
+	len1 = strspn(str, "0123456789abcdefABCDEF");
+	if (len1 < 1 || str[len1] != '/')
+	{
+		*have_error = true;
+		return InvalidXLogRecPtr;
+	}
+	len2 = strspn(str + len1 + 1, "0123456789abcdefABCDEF");
+	if (len2 < 1)
+	{
+		*have_error = true;
+		return InvalidXLogRecPtr;
+	}
+
+	/* Decode result. */
+	id = (uint32) strtoul(str, NULL, 16);
+	off = (uint32) strtoul(str + len1 + 1, NULL, 16);
+	result = ((uint64) id << 32) | off;
+
+	return result;
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..b00c7a8e41 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,8 +11,28 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
+/*
+ * Structures for listing up unique output plugins.
+ *
+ * XXX: these are not needed if we remove the function get_output_plugins().
+ * See comments atop it.
+ */
+typedef struct plugin_list_head
+{
+	struct plugin_list *head;
+	int			length;
+} plugin_list_head;
+
+typedef struct plugin_list
+{
+	int			dbnum;
+	char	   *plugin;
+	struct plugin_list *next;
+} plugin_list;
+
 /*
  * qsort comparator for pointers to library names
  *
@@ -43,10 +63,103 @@ library_name_compare(const void *p1, const void *p2)
 }
 
 
+/* Fetch list's length */
+static int
+plugin_list_length(plugin_list_head *listhead)
+{
+	return listhead ? listhead->length : 0;
+}
+
+/* Has the given plugin already been listed? */
+static bool
+is_plugin_unique(plugin_list_head *listhead, const char *plugin)
+{
+	plugin_list *point;
+
+	/* Quick return if the head is NULL */
+	if (listhead == NULL)
+		return true;
+
+	/* Seek the plugin list */
+	for (point = listhead->head; point; point = point->next)
+	{
+		if (strcmp(point->plugin, plugin) == 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Add an item to a plugin_list. */
+static void
+add_plugin_list_item(plugin_list_head **listhead, int dbnum, const char *plugin)
+{
+	plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));
+	plugin_list *oldentry;
+
+	newentry->dbnum = dbnum;
+	newentry->plugin = pg_strdup(plugin);
+
+	/* Initialize the header if not yet */
+	if (*listhead == NULL)
+		*listhead = (plugin_list_head *) pg_malloc0(sizeof(plugin_list_head));
+
+	/*
+	 * Set the new entry as the head of the list. We do not have to consider
+	 * the ordering because they are sorted later.
+	 */
+	oldentry = (*listhead)->head;
+	(*listhead)->head = newentry;
+	newentry->next = oldentry;
+
+	/* Increment the list for plugin_list_length() */
+	(*listhead)->length++;
+}
+
+/*
+ * Load the list of unique output plugins.
+ *
+ * XXX: Currently, we extract the list of unique output plugins, but this may
+ * be overkill. The list is used for two purposes - 1) to allocate the minimal
+ * memory for the library list and 2) to skip storing duplicated plugin names.
+ * However, the consumer check_loadable_libraries() can avoid double checks for
+ * the same library. The above means that we can arrange output plugins without
+ * considering their uniqueness, so that we can remove this function.
+ */
+static plugin_list_head *
+get_output_plugins(void)
+{
+	plugin_list_head *head = NULL;
+	int			dbnum;
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (count_old_cluster_logical_slots() == 0)
+		return NULL;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+		int			slotnum;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Add to the list if the plugin has not been listed yet */
+			if (is_plugin_unique(head, slot->plugin))
+				add_plugin_list_item(&head, dbnum, slot->plugin);
+		}
+	}
+
+	return head;
+}
+
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +168,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	plugin_list_head *output_plugins = get_output_plugins();
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -76,12 +190,22 @@ get_loadable_libraries(void)
 										"oid >= %u;",
 										ClanguageId,
 										FirstNormalObjectId);
+
 		totaltups += PQntuples(ress[dbnum]);
 
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate a minimal memory for extensions and logical replication output
+	 * plugins.
+	 *
+	 * XXX: As mentioned in comments atop get_output_plugins(), we may not
+	 * have to consider the uniqueness of entries. If so, we can use
+	 * count_old_cluster_logical_slots() instead of plugin_list_length().
+	 */
+	os_info.libraries = (LibraryInfo *) pg_malloc(
+												  (totaltups + plugin_list_length(output_plugins)) * sizeof(LibraryInfo));
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +213,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		plugin_list *point;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +226,26 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * If the old cluster has logical replication slots, plugins used by
+		 * them must be also stored. It must be done only once, so do it at
+		 * dbnum == 0 case.
+		 */
+		if (output_plugins == NULL)
+			continue;
+
+		if (dbnum != 0)
+			continue;
+
+		for (point = output_plugins->head; point; point = point->next)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(point->plugin);
+			os_info.libraries[totaltups].dbnum = point->dbnum;
+
+			totaltups++;
+		}
+
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..f8afcdb533 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -283,7 +285,18 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * If we are reading the old_cluster, gets infos for logical
+		 * replication slots.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -394,7 +407,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +613,135 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+	WALAvailability status = WALAVAIL_INVALID_LSN;
+
+	if (strcmp(str, "reserved") == 0)
+		status = WALAVAIL_RESERVED;
+	else if (strcmp(str, "extended") == 0)
+		status = WALAVAIL_EXTENDED;
+	else if (strcmp(str, "unreserved") == 0)
+		status = WALAVAIL_UNRESERVED;
+	else if (strcmp(str, "lost") == 0)
+		status = WALAVAIL_REMOVED;
+
+	return status;
+}
+
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are expressly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, confirmed_flush_lsn, wal_status "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_confirmed_flush;
+		int			i_wal_status;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_confirmed_flush = PQfnumber(res, "confirmed_flush_lsn");
+		i_wal_status = PQfnumber(res, "wal_status");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+			bool		have_error = false;
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->confirmed_flush = strtoLSN(
+											 PQgetvalue(res,
+														slotnum,
+														i_confirmed_flush),
+											 &have_error);
+			curr->wal_status = GetWALAvailabilityByString(
+														  PQgetvalue(res,
+																	 slotnum,
+																	 i_wal_status));
+
+			Assert(!have_error);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +752,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -642,8 +790,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +811,22 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		if (slotnum == 0)
+			pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..f81b1d5cc8 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,10 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	create_script_for_old_cluster_deletion(&deletion_script_file_name);
+
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -197,10 +203,6 @@ main(int argc, char **argv)
 		check_ok();
 	}
 
-	create_script_for_old_cluster_deletion(&deletion_script_file_name);
-
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -860,3 +862,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control fine, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..73d8a05b96 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	XLogRecPtr	confirmed_flush;	/* confirmed_flush_lsn of the slot */
+	WALAvailability wal_status; /* status of the slot */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +245,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -344,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,6 +420,7 @@ FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
 void		get_db_and_rel_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
@@ -471,3 +492,5 @@ void		parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
 										  char *old_pgdata, char *new_pgdata,
 										  char *old_tablespace);
 bool		reap_child(bool wait_for_child);
+
+XLogRecPtr	strtoLSN(const char *str, bool *have_error);
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..7c6cc6d04b
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,238 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful --check command
+
+# Preparations for the subsequent test:
+# 1. Start the cluster. --check works well when an old cluster is running.
+$old_publisher->start;
+
+# Actual run, successful --check command is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok(!-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Remove the remained slot
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 48ca852381..b3ecbdf0d8 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -177,19 +177,6 @@ typedef struct CheckpointStatsData
 
 extern PGDLLIMPORT CheckpointStatsData CheckpointStats;
 
-/*
- * GetWALAvailability return codes
- */
-typedef enum WALAvailability
-{
-	WALAVAIL_INVALID_LSN,		/* parameter error */
-	WALAVAIL_RESERVED,			/* WAL segment is within max_wal_size */
-	WALAVAIL_EXTENDED,			/* WAL segment is reserved by a slot or
-								 * wal_keep_size */
-	WALAVAIL_UNRESERVED,		/* no longer reserved, but not removed yet */
-	WALAVAIL_REMOVED			/* WAL segment has been removed */
-} WALAvailability;
-
 struct XLogRecData;
 struct XLogReaderState;
 
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index fe794c7740..b0ae6f151e 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -64,6 +64,19 @@ typedef uint32 TimeLineID;
  */
 typedef uint16 RepOriginId;
 
+/*
+ * Availability of WAL files claimed by replication slots.
+ */
+typedef enum WALAvailability
+{
+	WALAVAIL_INVALID_LSN,		/* parameter error */
+	WALAVAIL_RESERVED,			/* WAL segment is within max_wal_size */
+	WALAVAIL_EXTENDED,			/* WAL segment is reserved by a slot or
+								 * wal_keep_size */
+	WALAVAIL_UNRESERVED,		/* no longer reserved, but not removed yet */
+	WALAVAIL_REMOVED			/* WAL segment has been removed */
+} WALAvailability;
+
 /*
  * This chunk of hackery attempts to determine which file sync methods
  * are available on the current platform, and to choose an appropriate
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..9028b33423 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
@@ -3612,6 +3615,8 @@ pltcl_proc_desc
 pltcl_proc_key
 pltcl_proc_ptr
 pltcl_query_desc
+plugin_list
+plugin_list_head
 pointer
 polymorphic_actuals
 pos_trgm
-- 
2.27.0

#194

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#193)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Hou-san,

Based on this, it’s possible that the slots we get each time when
checking wal_status are different, because they may get changed in between

these checks.

This may not cause serious problems for now, because we will either
copy all the slots including ones invalidated when upgrading or we
report ERROR. But I feel it's better to get consistent result each
time we check the slots to close the possibility for problems in the
future. So, I feel we could centralize the check for wal_status and
slots fetch, so that even if some slots status changed after that, it won't have

a risk to affect our check. What do you think ?

Thank you for giving the suggestion! I agreed that to centralize checks, and I
had already started to modify. Here is the updated patch.

In this patch all slot infos are extracted in the
get_old_cluster_logical_slot_infos(),
upcoming functions uses them. Based on the change, two attributes
confirmed_flush and wal_status were added in LogicalSlotInfo.

IIUC we cannot use strcut List in the client codes, so structures and related
functions are added in the function.c. These are used for extracting unique
plugins, but it may be overkill because check_loadable_libraries() handle
duplicated entries. If we can ignore duplicated entries, these functions can be
removed.

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...

Thank for updating the patch ! here are few comments.

+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);

+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);

I think it would be better to do a sanity check using PQntuples() before
calling PQgetvalue() in above places.

+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+	WALAvailability status = WALAVAIL_INVALID_LSN;
+
+	if (strcmp(str, "reserved") == 0)
+		status = WALAVAIL_RESERVED;

Not a comment, but I am wondering if we could use conflicting field to do this
check, so that we could avoid the new conversion function and structure
movement. What do you think ?

+			curr->confirmed_flush = strtoLSN(
+											 PQgetvalue(res,
+														slotnum,
+														i_confirmed_flush),
+											 &have_error);

The indention looks a bit unusual.

4.
+	 * XXX: As mentioned in comments atop get_output_plugins(), we may not
+	 * have to consider the uniqueness of entries. If so, we can use
+	 * count_old_cluster_logical_slots() instead of plugin_list_length().
+	 */

I think check_loadable_libraries() will avoid loading the same library, so it
seems fine to skip duplicating the plugins and we can save some codes.

----
/* Did the library name change? Probe it. */
if (libnum == 0 || strcmp(lib, os_info.libraries[libnum - 1].name) != 0)
----

But if we think duplicating them would be better, I feel we could use the
SimpleStringList to store and duplicate the plugin name. get_output_plugins can
return an array of the stringlist, each stringlist includes the plugins names
in one db. I shared a rough POC patch to show how it works, the intention is to
avoid introducing our new plugin list API.

+	os_info.libraries = (LibraryInfo *) pg_malloc(
+												  (totaltups + plugin_list_length(output_plugins)) * sizeof(LibraryInfo));

If we think this looks too long, maybe using pg_malloc_array can help.

Best Regards,
Hou zj

Attachments:

0001-use-simple-ptr-list_topup_patchapplication/octet-stream; name=0001-use-simple-ptr-list_topup_patchDownload

From 4302004aa9d6f5d7f60b10be8821b31caa1196ab Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Wed, 6 Sep 2023 10:21:37 +0800
Subject: [PATCH] use simple ptr list

---
 src/bin/pg_upgrade/function.c | 123 ++++++++--------------------------
 1 file changed, 27 insertions(+), 96 deletions(-)

diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index b00c7a8e41..c579891bf6 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,28 +11,10 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/simple_list.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-/*
- * Structures for listing up unique output plugins.
- *
- * XXX: these are not needed if we remove the function get_output_plugins().
- * See comments atop it.
- */
-typedef struct plugin_list_head
-{
-	struct plugin_list *head;
-	int			length;
-} plugin_list_head;
-
-typedef struct plugin_list
-{
-	int			dbnum;
-	char	   *plugin;
-	struct plugin_list *next;
-} plugin_list;
-
 /*
  * qsort comparator for pointers to library names
  *
@@ -62,60 +44,6 @@ library_name_compare(const void *p1, const void *p2)
 			((const LibraryInfo *) p2)->dbnum;
 }
 
-
-/* Fetch list's length */
-static int
-plugin_list_length(plugin_list_head *listhead)
-{
-	return listhead ? listhead->length : 0;
-}
-
-/* Has the given plugin already been listed? */
-static bool
-is_plugin_unique(plugin_list_head *listhead, const char *plugin)
-{
-	plugin_list *point;
-
-	/* Quick return if the head is NULL */
-	if (listhead == NULL)
-		return true;
-
-	/* Seek the plugin list */
-	for (point = listhead->head; point; point = point->next)
-	{
-		if (strcmp(point->plugin, plugin) == 0)
-			return false;
-	}
-
-	return true;
-}
-
-/* Add an item to a plugin_list. */
-static void
-add_plugin_list_item(plugin_list_head **listhead, int dbnum, const char *plugin)
-{
-	plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));
-	plugin_list *oldentry;
-
-	newentry->dbnum = dbnum;
-	newentry->plugin = pg_strdup(plugin);
-
-	/* Initialize the header if not yet */
-	if (*listhead == NULL)
-		*listhead = (plugin_list_head *) pg_malloc0(sizeof(plugin_list_head));
-
-	/*
-	 * Set the new entry as the head of the list. We do not have to consider
-	 * the ordering because they are sorted later.
-	 */
-	oldentry = (*listhead)->head;
-	(*listhead)->head = newentry;
-	newentry->next = oldentry;
-
-	/* Increment the list for plugin_list_length() */
-	(*listhead)->length++;
-}
-
 /*
  * Load the list of unique output plugins.
  *
@@ -126,19 +54,25 @@ add_plugin_list_item(plugin_list_head **listhead, int dbnum, const char *plugin)
  * the same library. The above means that we can arrange output plugins without
  * considering their uniqueness, so that we can remove this function.
  */
-static plugin_list_head *
-get_output_plugins(void)
+static SimpleStringList **
+get_output_plugins(int *pluginnum)
 {
-	plugin_list_head *head = NULL;
+	SimpleStringList **plugins_perdb;
 	int			dbnum;
 
+	Assert(pluginnum);
+	*pluginnum = 0;
+
 	/* Quick return if there are no logical slots to be migrated. */
 	if (count_old_cluster_logical_slots() == 0)
 		return NULL;
 
+	plugins_perdb = pg_malloc0_array(SimpleStringList *, old_cluster.dbarr.ndbs);
+
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+		SimpleStringList *pnames = pg_malloc0_object(SimpleStringList);
 		int			slotnum;
 
 		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
@@ -146,12 +80,17 @@ get_output_plugins(void)
 			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
 
 			/* Add to the list if the plugin has not been listed yet */
-			if (is_plugin_unique(head, slot->plugin))
-				add_plugin_list_item(&head, dbnum, slot->plugin);
+			if (!simple_string_list_member(pnames, slot->plugin))
+			{
+				simple_string_list_append(pnames, pg_strdup(slot->plugin));
+				*pluginnum++;
+			}
 		}
+
+		plugins_perdb[dbnum] = pnames;
 	}
 
-	return head;
+	return plugins_perdb;
 }
 
 /*
@@ -168,7 +107,8 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
-	plugin_list_head *output_plugins = get_output_plugins();
+	int			pluginnum;
+	SimpleStringList **output_plugins = get_output_plugins(&pluginnum);
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -199,13 +139,8 @@ get_loadable_libraries(void)
 	/*
 	 * Allocate a minimal memory for extensions and logical replication output
 	 * plugins.
-	 *
-	 * XXX: As mentioned in comments atop get_output_plugins(), we may not
-	 * have to consider the uniqueness of entries. If so, we can use
-	 * count_old_cluster_logical_slots() instead of plugin_list_length().
 	 */
-	os_info.libraries = (LibraryInfo *) pg_malloc(
-												  (totaltups + plugin_list_length(output_plugins)) * sizeof(LibraryInfo));
+	os_info.libraries = (LibraryInfo *) pg_malloc((totaltups + pluginnum) * sizeof(LibraryInfo));
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -213,7 +148,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
-		plugin_list *point;
+		SimpleStringListCell *cell;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -229,19 +164,15 @@ get_loadable_libraries(void)
 
 		/*
 		 * If the old cluster has logical replication slots, plugins used by
-		 * them must be also stored. It must be done only once, so do it at
-		 * dbnum == 0 case.
+		 * them must be also stored.
 		 */
-		if (output_plugins == NULL)
-			continue;
-
-		if (dbnum != 0)
+		if (output_plugins[dbnum] == NULL)
 			continue;
 
-		for (point = output_plugins->head; point; point = point->next)
+		for (cell = output_plugins[dbnum]->head; cell; cell = cell->next)
 		{
-			os_info.libraries[totaltups].name = pg_strdup(point->plugin);
-			os_info.libraries[totaltups].dbnum = point->dbnum;
+			os_info.libraries[totaltups].name = pg_strdup(cell->val);
+			os_info.libraries[totaltups].dbnum = dbnum;
 
 			totaltups++;
 		}
-- 
2.30.0.windows.2

#195

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#193)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi, here are some comments for patch v31-0002.

======
src/bin/pg_upgrade/controldata.c

1. get_control_data

+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {
+ bool have_error = false;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ cluster->controldata.chkpnt_latest =
+ strtoLSN(p, &have_error);

1a.
The declaration assignment of 'have_error' is redundant because it
gets overwritten before it is checked anyhow.

1b.
IMO that first check logic should also be shifted to be *inside* the
strtoLSN and it would just return have_error true. This eliminates
having 2x pg_fatal that have the same purpose.

~~~

2. strtoLSN

+/*
+ * Convert String to XLogRecPtr.
+ *
+ * This function is ported from pg_lsn_in_internal(). The function cannot be
+ * called from client binaries.
+ */
+XLogRecPtr
+strtoLSN(const char *str, bool *have_error)

SUGGESTION (comment wording)
This function is ported from pg_lsn_in_internal() which cannot be
called from client binaries.

======
src/bin/pg_upgrade/function.c

3. struct plugin_list

+typedef struct plugin_list
+{
+ int dbnum;
+ char    *plugin;
+ struct plugin_list *next;
+} plugin_list;

I found that name confusing. IMO should be like 'plugin_list_elem'.

e.g. it gets too strange in subsequent code:
+ plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));

~~~

4. is_plugin_unique

+/* Has the given plugin already been listed? */
+static bool
+is_plugin_unique(plugin_list_head *listhead, const char *plugin)
+{
+ plugin_list *point;
+
+ /* Quick return if the head is NULL */
+ if (listhead == NULL)
+ return true;
+
+ /* Seek the plugin list */
+ for (point = listhead->head; point; point = point->next)
+ {
+ if (strcmp(point->plugin, plugin) == 0)
+ return false;
+ }
+
+ return true;
+}

What's the meaning of the name 'point'? Maybe something generic like
'cur' or similar is better?

~~~

5. get_output_plugins

+/*
+ * Load the list of unique output plugins.
+ *
+ * XXX: Currently, we extract the list of unique output plugins, but this may
+ * be overkill. The list is used for two purposes - 1) to allocate the minimal
+ * memory for the library list and 2) to skip storing duplicated plugin names.
+ * However, the consumer check_loadable_libraries() can avoid double checks for
+ * the same library. The above means that we can arrange output plugins without
+ * considering their uniqueness, so that we can remove this function.
+ */
+static plugin_list_head *
+get_output_plugins(void)
+{
+ plugin_list_head *head = NULL;
+ int dbnum;
+
+ /* Quick return if there are no logical slots to be migrated. */
+ if (count_old_cluster_logical_slots() == 0)
+ return NULL;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+ /* Add to the list if the plugin has not been listed yet */
+ if (is_plugin_unique(head, slot->plugin))
+ add_plugin_list_item(&head, dbnum, slot->plugin);
+ }
+ }
+
+ return head;
+}

About the XXX. Yeah, since the uniqueness seems checked later anyway
all this extra code seems overkill. Instead of all the extra code you
just need a comment to mention how it will be sorted and checked
later.

But even if you prefer to keep it, I thought those 2 functions
'is_plugin_unique()' and 'add_plugin_list_item()' could have been
combined to just have 'add_plugin_list_unique_item()'. Since order
does not matter, such a function would just add items to the end of
the list (after finding uniqueness) instead of to the head.

~~~

6. get_loadable_libraries

FirstNormalObjectId);
+
totaltups += PQntuples(ress[dbnum]);
~

The extra blank line in the existing code is not needed in this patch.

~~~

7. get_loadable_libraries

int rowno;
+ plugin_list *point;

Same as a prior comment #4. What's the meaning of the name 'point'?

~~~

8. get_loadable_libraries
+
+ /*
+ * If the old cluster has logical replication slots, plugins used by
+ * them must be also stored. It must be done only once, so do it at
+ * dbnum == 0 case.
+ */
+ if (output_plugins == NULL)
+ continue;
+
+ if (dbnum != 0)
+ continue;

This logic seems misplaced. If this "must be done only once" then why
is it within the db loop in the first place? Shouldn't this be done
seperately outside the loop?

======
src/bin/pg_upgrade/info.c

9.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)

Should this be forward declared like the other static functions are?

~~~

10. get_old_cluster_logical_slot_infos

+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];
+ bool have_error = false;

Here seems an unnecessary assignment to 'have_error' because it will
always be assigned again before it is checked.

~~~

11. get_old_cluster_logical_slot_infos

+ curr->confirmed_flush = strtoLSN(
+ PQgetvalue(res,
+ slotnum,
+ i_confirmed_flush),
+ &have_error);
+ curr->wal_status = GetWALAvailabilityByString(
+   PQgetvalue(res,
+ slotnum,
+ i_wal_status));

Can this excessive wrapping be improved? Maybe new vars are needed.

~~~

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+ if (slotnum == 0)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_info->slotname,
+    slot_info->plugin,
+    slot_info->two_phase);
+ }
+}

This seems an odd way to output the heading. Isn't it better to put
this outside the loop?

SUGGESTION
if (slot_arr->nslots > 0)
pg_log(PG_VERBOSE, "Logical replication slots within the database:");

======
src/bin/pg_upgrade/pg_upgrade.c

13.
+/*
+ * setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control fine, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)

typo

/control fine/control file/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#196

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#193)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...

Another possible idea is to show all the WARNINGS but only when in verbose mode.

-------
Kind Regards,
Peter Smith.
Fujitsu Australia

#197

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#194)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Wednesday, September 6, 2023 11:18 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田隼人
<kuroda.hayato@fujitsu.com> wrote:
4.
+	 * XXX: As mentioned in comments atop get_output_plugins(), we may
not
+	 * have to consider the uniqueness of entries. If so, we can use
+	 * count_old_cluster_logical_slots() instead of plugin_list_length().
+	 */
I think check_loadable_libraries() will avoid loading the same library, so it seems
fine to skip duplicating the plugins and we can save some codes.

Sorry, there is a typo, I mean "deduplicating" instead of " duplicating "

----
/* Did the library name change? Probe it. */
if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
1].name) != 0)
----

But if we think duplicating them would be better, I feel we could use the

Here also " duplicating " should be "deduplicating".

Best Regards,
Hou zj

#198

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#194)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 6, 2023 at 8:47 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Hou-san,

Based on this, it’s possible that the slots we get each time when
checking wal_status are different, because they may get changed in between

these checks.

This may not cause serious problems for now, because we will either
copy all the slots including ones invalidated when upgrading or we
report ERROR. But I feel it's better to get consistent result each
time we check the slots to close the possibility for problems in the
future. So, I feel we could centralize the check for wal_status and
slots fetch, so that even if some slots status changed after that, it won't have

a risk to affect our check. What do you think ?

Thank you for giving the suggestion! I agreed that to centralize checks, and I
had already started to modify. Here is the updated patch.

In this patch all slot infos are extracted in the
get_old_cluster_logical_slot_infos(),
upcoming functions uses them. Based on the change, two attributes
confirmed_flush and wal_status were added in LogicalSlotInfo.

IIUC we cannot use strcut List in the client codes, so structures and related
functions are added in the function.c. These are used for extracting unique
plugins, but it may be overkill because check_loadable_libraries() handle
duplicated entries. If we can ignore duplicated entries, these functions can be
removed.

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...

Thank for updating the patch ! here are few comments.

1.
+       res = executeQueryOrDie(conn, "SHOW wal_level;");
+       wal_level = PQgetvalue(res, 0, 0);
+       res = executeQueryOrDie(conn, "SHOW wal_level;");
+       wal_level = PQgetvalue(res, 0, 0);
I think it would be better to do a sanity check using PQntuples() before
calling PQgetvalue() in above places.

2.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+       WALAvailability status = WALAVAIL_INVALID_LSN;
+
+       if (strcmp(str, "reserved") == 0)
+               status = WALAVAIL_RESERVED;
Not a comment, but I am wondering if we could use conflicting field to do this
check, so that we could avoid the new conversion function and structure
movement. What do you think ?

I also think referring to the conflicting field would be better not
only for the purpose of avoiding extra code but also to give accurate
information about invalidated slots for which we want to give an
error.

Additionally, I think we should try to avoid writing a new function
strtoLSN as that adds a maintainability burden. We can probably send
the value fetched from pg_controldata in the query for comparison with
confirmed_flush LSN.

--
With Regards,
Amit Kapila.

#199

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Peter Smith (#196)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 6, 2023 at 11:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...

Another possible idea is to show all the WARNINGS but only when in verbose mode.

I think it would be better to write problematic slots in the script
file like we are doing in the function
check_for_composite_data_type_usage()->check_for_data_types_usage()
and give a message suggesting what the user can do as we are doing in
check_for_composite_data_type_usage(). That will be helpful for the
user to take necessary action.

A few other comments:
=================
1.
@@ -189,6 +199,8 @@ check_new_cluster(void)
{
get_db_and_rel_infos(&new_cluster);

+ check_new_cluster_logical_replication_slots();
+
  check_new_cluster_is_empty();

check_loadable_libraries();

Why check_new_cluster_logical_replication_slots is done before
check_new_cluster_is_empty? At least check_new_cluster_is_empty()
would be much quicker to return an error if any. I think if we don't
have a specific reason to position this new check, we can do it at the
end after check_for_new_tablespace_dir() to avoid breaking the order
of existing checks.

2. Shall we rename get_db_and_rel_infos() to
get_db_rel_and_slot_infos() or something like that as that function
now fetches the slot information as well?

--
With Regards,
Amit Kapila.

#200

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#194)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thank you for giving comments! PSA new version.
0001 is updated based on the forked thread.

1.
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
I think it would be better to do a sanity check using PQntuples() before
calling PQgetvalue() in above places.

Added.

2.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+	WALAvailability status = WALAVAIL_INVALID_LSN;
+
+	if (strcmp(str, "reserved") == 0)
+		status = WALAVAIL_RESERVED;
Not a comment, but I am wondering if we could use conflicting field to do this
check, so that we could avoid the new conversion function and structure
movement. What do you think ?

I checked pg_get_replication_slots() and agreed that pg_replication_slots.conflicting
indicates whether the slot is usable or not. I can use the attribute instead of porting
WALAvailability. Fixed.

+			curr->confirmed_flush = strtoLSN(
+
PQgetvalue(res,
+
slotnum,
+
i_confirmed_flush),
+
&have_error);

The indention looks a bit unusual.

The part is not needed anymore.

4.
+	 * XXX: As mentioned in comments atop get_output_plugins(), we may
not
+	 * have to consider the uniqueness of entries. If so, we can use
+	 * count_old_cluster_logical_slots() instead of plugin_list_length().
+	 */
I think check_loadable_libraries() will avoid loading the same library, so it
seems fine to skip duplicating the plugins and we can save some codes.

----
/* Did the library name change? Probe it. */
if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
1].name) != 0)
----

But if we think duplicating them would be better, I feel we could use the
SimpleStringList to store and duplicate the plugin name. get_output_plugins can
return an array of the stringlist, each stringlist includes the plugins names
in one db. I shared a rough POC patch to show how it works, the intention is to
avoid introducing our new plugin list API.

Actually I do not like the style neither. Peter also said that we can skip checking the
uniqueness, so removed.

5.
+	os_info.libraries = (LibraryInfo *) pg_malloc(
+
(totaltups + plugin_list_length(output_plugins)) *
sizeof(LibraryInfo));
If we think this looks too long, maybe using pg_malloc_array can help.

I checked whole of the patch and used these shorten macros if the line exceeded
80 columns.

Also, I found a cfbot failure [1]https://cirrus-ci.com/task/4634769732927488 but I could not find any reasons.
I will keep investigating more about it.

[1]: https://cirrus-ci.com/task/4634769732927488

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v32-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v32-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From faf53821a831129c1946d1bfd56a487658a72146 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v32 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith, Ashutosh Bapat
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  34 +++++-
 src/include/replication/slot.h                |   5 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 102 ++++++++++++++++++
 5 files changed, 139 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..6559a61753 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1573,10 +1574,10 @@ restart:
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * location. is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1602,31 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes.  However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose confirmed_flush LSN is changed since we
+		 * last saved the slot to disk. This will help in avoiding retreat of
+		 * the confirmed_flush LSN after restart. At other times, the walsender
+		 * keeps saving the slot from time to time as the replication
+		 * progresses, so there is no clear advantage of flushing additional
+		 * slots at the time of checkpoint.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(&s->mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+				s->data.confirmed_flush != s->last_saved_confirmed_flush)
+			{
+				s->just_dirtied = true;
+				s->dirty = true;
+			}
+
+			SpinLockRelease(&s->mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1899,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2101,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..da8978342a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,9 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/* This is used to track the last saved confirmed_flush LSN value */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +244,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..224a840a61
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+autovacuum = off
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v32-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v32-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 86d450c49e028320e1b612f3e182ee3755a2c924 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v32 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 ++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 201 +++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 +++
 src/bin/pg_upgrade/function.c                 |  30 ++-
 src/bin/pg_upgrade/info.c                     | 154 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 111 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  28 +-
 .../t/003_logical_replication_slots.pl        | 241 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 11 files changed, 839 insertions(+), 42 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..bef107295c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -360,6 +360,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -629,8 +694,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..4baf2e1063 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -30,6 +31,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +89,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +110,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +200,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +223,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +247,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1396,162 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+	{
+		if (nslots_on_new == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots",
+					 nslots_on_new);
+	}
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "problematic_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot still usable? */
+			if (slot->conflicting)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot is in 'lost' state\n",
+						slot->slotname);
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caughtup)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+						slot->slotname);
+			}
+		}
+
+		if (script)
+		{
+			fclose(script);
+
+			pg_fatal("The source cluster contains one or more problematic logical replication slots.\n"
+					 "The needed workaround depends on the problem.\n"
+					 "1) If the problem is \"The slot is in 'lost' state,\" You can drop such replication slots.\n"
+					 "2) If the problem is \"The slot has not consumed WALs yet,\" you can consume all remaining WALs.\n"
+					 "Then, you can restart the upgrade.\n"
+					 "A list of problematic logical replication slots is in the file:\n"
+					 "    %s", output_path);
+		}
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..b174b4c24c 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
 			((const LibraryInfo *) p2)->dbnum;
 }
 
-
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -81,7 +83,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate a memory for extensions and logical replication output
+	 * plugins.
+	 */
+	os_info.libraries = pg_malloc_array(LibraryInfo,
+										totaltups + count_old_cluster_logical_slots());
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +96,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,21 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the name of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
+
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..3d8e1cb9b3 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,18 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * If we are reading the old_cluster, gets infos for logical
+		 * replication slots.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -394,7 +407,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +613,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are expressly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caughtup, conflicting "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caughtup;
+		int			i_conflicting;
+
+		slotinfos = pg_malloc_array(LogicalSlotInfo, num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caughtup = PQfnumber(res, "caughtup");
+		i_conflicting = PQfnumber(res, "conflicting");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
+			curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -642,8 +762,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +783,22 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	if (slot_arr->nslots > 1)
+		pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 4562dafcff..6442bbffe6 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,10 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	create_script_for_old_cluster_deletion(&deletion_script_file_name);
+
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -197,10 +203,6 @@ main(int argc, char **argv)
 		check_ok();
 	}
 
-	create_script_for_old_cluster_deletion(&deletion_script_file_name);
-
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -591,7 +593,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -860,3 +862,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..8c14f45295 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caughtup;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		conflicting;	/* Is the slot usable? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -344,7 +366,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -399,7 +420,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
@@ -471,3 +493,5 @@ void		parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
 										  char *old_pgdata, char *new_pgdata,
 										  char *old_tablespace);
 bool		reap_child(bool wait_for_child);
+
+XLogRecPtr	strtoLSN(const char *str, bool *have_error);
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..12f4b9b88e
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,241 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Successful --check command
+
+# Preparations for the subsequent test:
+# 1. Start the cluster. --check works well when an old cluster is running.
+$old_publisher->start;
+
+# 2. Run a query to make sure that the cluster is started
+$old_publisher->safe_psql('postgres', q{SELECT 1});
+
+# Actual run, successful --check command is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok(!-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Remove the remained slot
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0656c94416..2dac50a30f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1501,7 +1501,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#201

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#195)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing!

======
src/bin/pg_upgrade/controldata.c

1. get_control_data
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {
+ bool have_error = false;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ cluster->controldata.chkpnt_latest =
+ strtoLSN(p, &have_error);
1a.
The declaration assignment of 'have_error' is redundant because it
gets overwritten before it is checked anyhow.

~

1b.
IMO that first check logic should also be shifted to be *inside* the
strtoLSN and it would just return have_error true. This eliminates
having 2x pg_fatal that have the same purpose.

~~~

2. strtoLSN
+/*
+ * Convert String to XLogRecPtr.
+ *
+ * This function is ported from pg_lsn_in_internal(). The function cannot be
+ * called from client binaries.
+ */
+XLogRecPtr
+strtoLSN(const char *str, bool *have_error)
SUGGESTION (comment wording)
This function is ported from pg_lsn_in_internal() which cannot be
called from client binaries.

These changes are reverted.

src/bin/pg_upgrade/function.c

3. struct plugin_list
+typedef struct plugin_list
+{
+ int dbnum;
+ char    *plugin;
+ struct plugin_list *next;
+} plugin_list;
I found that name confusing. IMO should be like 'plugin_list_elem'.

e.g. it gets too strange in subsequent code:
+ plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));

~~~

4. is_plugin_unique
+/* Has the given plugin already been listed? */
+static bool
+is_plugin_unique(plugin_list_head *listhead, const char *plugin)
+{
+ plugin_list *point;
+
+ /* Quick return if the head is NULL */
+ if (listhead == NULL)
+ return true;
+
+ /* Seek the plugin list */
+ for (point = listhead->head; point; point = point->next)
+ {
+ if (strcmp(point->plugin, plugin) == 0)
+ return false;
+ }
+
+ return true;
+}
What's the meaning of the name 'point'? Maybe something generic like
'cur' or similar is better?

~~~

5. get_output_plugins
+/*
+ * Load the list of unique output plugins.
+ *
+ * XXX: Currently, we extract the list of unique output plugins, but this may
+ * be overkill. The list is used for two purposes - 1) to allocate the minimal
+ * memory for the library list and 2) to skip storing duplicated plugin names.
+ * However, the consumer check_loadable_libraries() can avoid double checks
for
+ * the same library. The above means that we can arrange output plugins without
+ * considering their uniqueness, so that we can remove this function.
+ */
+static plugin_list_head *
+get_output_plugins(void)
+{
+ plugin_list_head *head = NULL;
+ int dbnum;
+
+ /* Quick return if there are no logical slots to be migrated. */
+ if (count_old_cluster_logical_slots() == 0)
+ return NULL;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+ /* Add to the list if the plugin has not been listed yet */
+ if (is_plugin_unique(head, slot->plugin))
+ add_plugin_list_item(&head, dbnum, slot->plugin);
+ }
+ }
+
+ return head;
+}
About the XXX. Yeah, since the uniqueness seems checked later anyway
all this extra code seems overkill. Instead of all the extra code you
just need a comment to mention how it will be sorted and checked
later.

But even if you prefer to keep it, I thought those 2 functions
'is_plugin_unique()' and 'add_plugin_list_item()' could have been
combined to just have 'add_plugin_list_unique_item()'. Since order
does not matter, such a function would just add items to the end of
the list (after finding uniqueness) instead of to the head.

Based on suggestions from you and Hou[1]/messages/by-id/OS0PR01MB57165A8F24BEFF5F4CCBBE5994EFA@OS0PR01MB5716.jpnprd01.prod.outlook.com, I withdrew to check their uniqueness.
So these functions and structures are removed.

6. get_loadable_libraries

FirstNormalObjectId);
+
totaltups += PQntuples(ress[dbnum]);
~

The extra blank line in the existing code is not needed in this patch.

Removed.

7. get_loadable_libraries

int rowno;
+ plugin_list *point;

~

Same as a prior comment #4. What's the meaning of the name 'point'?

The variable was removed.

8. get_loadable_libraries
+
+ /*
+ * If the old cluster has logical replication slots, plugins used by
+ * them must be also stored. It must be done only once, so do it at
+ * dbnum == 0 case.
+ */
+ if (output_plugins == NULL)
+ continue;
+
+ if (dbnum != 0)
+ continue;
This logic seems misplaced. If this "must be done only once" then why
is it within the db loop in the first place? Shouldn't this be done
seperately outside the loop?

The logic was removed.

src/bin/pg_upgrade/info.c
9.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
Should this be forward declared like the other static functions are?

The function was removed.

10. get_old_cluster_logical_slot_infos
+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];
+ bool have_error = false;
Here seems an unnecessary assignment to 'have_error' because it will
always be assigned again before it is checked.

The variable was removed.

11. get_old_cluster_logical_slot_infos
+ curr->confirmed_flush = strtoLSN(
+ PQgetvalue(res,
+ slotnum,
+ i_confirmed_flush),
+ &have_error);
+ curr->wal_status = GetWALAvailabilityByString(
+   PQgetvalue(res,
+ slotnum,
+ i_wal_status));
Can this excessive wrapping be improved? Maybe new vars are needed.

The part was removed.

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+ if (slotnum == 0)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_info->slotname,
+    slot_info->plugin,
+    slot_info->two_phase);
+ }
+}

This seems an odd way to output the heading. Isn't it better to put
this outside the loop?

SUGGESTION
if (slot_arr->nslots > 0)
pg_log(PG_VERBOSE, "Logical replication slots within the database:");

Fixed.

src/bin/pg_upgrade/pg_upgrade.c

13.
+/*
+ * setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control fine, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)

typo

/control fine/control file/

Fixed.

[1]: /messages/by-id/OS0PR01MB57165A8F24BEFF5F4CCBBE5994EFA@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#202

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#198)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing!

I also think referring to the conflicting field would be better not
only for the purpose of avoiding extra code but also to give accurate
information about invalidated slots for which we want to give an
error.

Fixed.

Additionally, I think we should try to avoid writing a new function
strtoLSN as that adds a maintainability burden. We can probably send
the value fetched from pg_controldata in the query for comparison with
confirmed_flush LSN.

Changed like that. LogicalSlotInfo was also updated to have the Boolean.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#203

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#199)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I think it would be better to write problematic slots in the script
file like we are doing in the function
check_for_composite_data_type_usage()->check_for_data_types_usage()
and give a message suggesting what the user can do as we are doing in
check_for_composite_data_type_usage(). That will be helpful for the
user to take necessary action.

Did it. I wondered how we output the list of slots because there are two types of
problem, but currently I used a same file. If you have better approach, please
teach me.

A few other comments:
=================
1.
@@ -189,6 +199,8 @@ check_new_cluster(void)
{
get_db_and_rel_infos(&new_cluster);
+ check_new_cluster_logical_replication_slots();
+
check_new_cluster_is_empty();
check_loadable_libraries();

Why check_new_cluster_logical_replication_slots is done before
check_new_cluster_is_empty? At least check_new_cluster_is_empty()
would be much quicker to return an error if any. I think if we don't
have a specific reason to position this new check, we can do it at the
end after check_for_new_tablespace_dir() to avoid breaking the order
of existing checks.

Moved to the bottom.

2. Shall we rename get_db_and_rel_infos() to
get_db_rel_and_slot_infos() or something like that as that function
now fetches the slot information as well?

Fixed. Comments were also fixed as well.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#204

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#200)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi, here are my code review comments for the patch v32-0002

======
src/bin/pg_upgrade/check.c

1. check_new_cluster_logical_replication_slots

+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine max_replication_slots");

Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to max_replication_slots?

~~~

2. check_new_cluster_logical_replication_slots

+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level");

Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to wal_level?

~~~

3. check_old_cluster_for_valid_slots

I saw that similar code with scripts like this is doing PG_REPORT:

pg_log(PG_REPORT, "fatal");

but that PG_REPORT is missing from this function.

======
src/bin/pg_upgrade/function.c

4. get_loadable_libraries

@@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
((const LibraryInfo *) p2)->dbnum;
}

-
/*
* get_loadable_libraries()

Removing that blank line (above this function) should not be included
in the patch.

~~~

5. get_loadable_libraries

+ /*
+ * Allocate a memory for extensions and logical replication output
+ * plugins.
+ */
+ os_info.libraries = pg_malloc_array(LibraryInfo,
+ totaltups + count_old_cluster_logical_slots());

/Allocate a memory/Allocate memory/

~~~

6. get_loadable_libraries
+ /*
+ * Store the name of output plugins as well. There is a possibility
+ * that duplicated plugins are set, but the consumer function
+ * check_loadable_libraries() will avoid checking the same library, so
+ * we do not have to consider their uniqueness here.
+ */
+ for (slotno = 0; slotno < slot_arr->nslots; slotno++)

/Store the name/Store the names/

======
src/bin/pg_upgrade/info.c

7. get_old_cluster_logical_slot_infos

+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+ i_caughtup = PQfnumber(res, "caughtup");
+ i_conflicting = PQfnumber(res, "conflicting");
+
+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+ curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+ curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+ curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+ curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
+ curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting),
"t") == 0);
+ }

Saying "tup" always looks like it should be something tuple-related.
IMO it will be better to call all these "caught_up" instead of
"caughtup":

"caughtup" ==> "caught_up"
i_caughtup ==> i_caught_up
curr->caughtup ==> curr->caught_up

~~~

8. print_slot_infos

+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ if (slot_arr->nslots > 1)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_info->slotname,
+    slot_info->plugin,
+    slot_info->two_phase);
+ }
+}

Although it makes no functional difference, it might be neater if the
for loop is also within that "if (slot_arr->nslots > 1)" condition.

======
src/bin/pg_upgrade/pg_upgrade.h

9.
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* can the slot decode 2PC? */
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */
+ bool conflicting; /* Is the slot usable? */
+} LogicalSlotInfo;

9a.
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */

caughtup ==> caught_up

9b.
+ bool conflicting; /* Is the slot usable? */

The field name has the opposite meaning of the wording of the comment.
(e.g. it is usable when it is NOT conflicting, right?).

Maybe there needs a better field name, or a better comment, or both.
AFAICT from other code pg_fatal message 'conflicting' is always
interpreted as 'lost' so maybe the field should be called that?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#205

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#204)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version.

======
src/bin/pg_upgrade/check.c

1. check_new_cluster_logical_replication_slots
+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine max_replication_slots");
Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to max_replication_slots?

Right, fixed. Also, the checking was added at the first query.

2. check_new_cluster_logical_replication_slots
+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level");
Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to wal_level?

Fixed.

3. check_old_cluster_for_valid_slots

I saw that similar code with scripts like this is doing PG_REPORT:

pg_log(PG_REPORT, "fatal");

but that PG_REPORT is missing from this function.

Added.

src/bin/pg_upgrade/function.c

4. get_loadable_libraries

@@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
((const LibraryInfo *) p2)->dbnum;
}

-
/*
* get_loadable_libraries()

~

Removing that blank line (above this function) should not be included
in the patch.

Restored the blank.

5. get_loadable_libraries

+ /*
+ * Allocate a memory for extensions and logical replication output
+ * plugins.
+ */
+ os_info.libraries = pg_malloc_array(LibraryInfo,
+ totaltups + count_old_cluster_logical_slots());

/Allocate a memory/Allocate memory/

Fixed.

6. get_loadable_libraries
+ /*
+ * Store the name of output plugins as well. There is a possibility
+ * that duplicated plugins are set, but the consumer function
+ * check_loadable_libraries() will avoid checking the same library, so
+ * we do not have to consider their uniqueness here.
+ */
+ for (slotno = 0; slotno < slot_arr->nslots; slotno++)

/Store the name/Store the names/

Fixed.

src/bin/pg_upgrade/info.c

7. get_old_cluster_logical_slot_infos

+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+ i_caughtup = PQfnumber(res, "caughtup");
+ i_conflicting = PQfnumber(res, "conflicting");
+
+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+ curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+ curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+ curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+ curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
+ curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting),
"t") == 0);
+ }

Saying "tup" always looks like it should be something tuple-related.
IMO it will be better to call all these "caught_up" instead of
"caughtup":

"caughtup" ==> "caught_up"
i_caughtup ==> i_caught_up
curr->caughtup ==> curr->caught_up

Fixed. The alias was also fixed.

8. print_slot_infos

+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ if (slot_arr->nslots > 1)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_info->slotname,
+    slot_info->plugin,
+    slot_info->two_phase);
+ }
+}

Although it makes no functional difference, it might be neater if the
for loop is also within that "if (slot_arr->nslots > 1)" condition.

Hmm, but the point makes more differences between print_rel_infos() and
print_slot_infos(), I thought it should be similar. Instead, I added a quick
return. Thought?

src/bin/pg_upgrade/pg_upgrade.h

9.
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* can the slot decode 2PC? */
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */
+ bool conflicting; /* Is the slot usable? */
+} LogicalSlotInfo;

9a.
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */

caughtup ==> caught_up

Fixed.

9b.
+ bool conflicting; /* Is the slot usable? */

The field name has the opposite meaning of the wording of the comment.
(e.g. it is usable when it is NOT conflicting, right?).

Maybe there needs a better field name, or a better comment, or both.
AFAICT from other code pg_fatal message 'conflicting' is always
interpreted as 'lost' so maybe the field should be called that?

Changed to "is_lost", which is easy to understand the meaning.

Also, I fixed following points:

* Added a period to messages in check_new_cluster_logical_replication_slots(),
except the final line. According to other functions like check_new_cluster_is_empty(),
the period is ignored if the escape sequence is at the end.
* Removed the --check test because sometimes it failed on the windows machine.
I reported in another thread [1]/messages/by-id/TYAPR01MB586654E2D74B838021BE77CAF5EEA@TYAPR01MB5866.jpnprd01.prod.outlook.com.
* Set max_slot_wal_keep_size to -1 when old cluster was started. Accordin to the
discussion [2]/messages/by-id/ZPl659a5hPDHPq9w@paquier.xyz, the setting is sufficient to supress the WAL removal.

[1]: /messages/by-id/TYAPR01MB586654E2D74B838021BE77CAF5EEA@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/ZPl659a5hPDHPq9w@paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v33-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v33-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From 912f2c0f51f7c3f9712398c5f2fdd5a6cd185182 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v33 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith, Ashutosh Bapat
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  34 +++++-
 src/include/replication/slot.h                |   5 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 102 ++++++++++++++++++
 5 files changed, 139 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..2d22b4e956 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1573,10 +1574,10 @@ restart:
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * location. is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1602,31 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes.  However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose confirmed_flush LSN is changed since we
+		 * last saved the slot to disk. This will help in avoiding retreat of
+		 * the confirmed_flush LSN after restart. At other times, the
+		 * walsender keeps saving the slot from time to time as the
+		 * replication progresses, so there is no clear advantage of flushing
+		 * additional slots at the time of checkpoint.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(&s->mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+				s->data.confirmed_flush != s->last_saved_confirmed_flush)
+			{
+				s->just_dirtied = true;
+				s->dirty = true;
+			}
+
+			SpinLockRelease(&s->mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1899,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2101,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..da8978342a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,9 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/* This is used to track the last saved confirmed_flush LSN value */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +244,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..224a840a61
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+autovacuum = off
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v33-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v33-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From c345b0aac3293f216e9c0c62a86ce6a0075fc41a Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v33 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 +++++-
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 207 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  28 ++-
 src/bin/pg_upgrade/info.c                     | 157 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 111 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  28 ++-
 src/bin/pg_upgrade/server.c                   |   7 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 12 files changed, 826 insertions(+), 42 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..4f25a64c04 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +717,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..8cbd9e9ede 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
 
 #include "postgres_fe.h"
 
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
@@ -30,6 +31,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +89,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +110,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +200,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +223,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +247,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1396,168 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots.");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+	{
+		if (nslots_on_new == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots.",
+					 nslots_on_new);
+	}
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots.");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level.");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "problematic_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot still usable? */
+			if (slot->is_lost)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot is in 'lost' state\n",
+						slot->slotname);
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+						slot->slotname);
+			}
+		}
+
+		if (script)
+		{
+			fclose(script);
+
+			pg_log(PG_REPORT, "fatal");
+			pg_fatal("The source cluster contains one or more problematic logical replication slots.\n"
+					 "The needed workaround depends on the problem.\n"
+					 "1) If the problem is \"The slot is in 'lost' state,\" You can drop such replication slots.\n"
+					 "2) If the problem is \"The slot has not consumed WALs yet,\" you can consume all remaining WALs.\n"
+					 "Then, you can restart the upgrade.\n"
+					 "A list of problematic logical replication slots is in the file:\n"
+					 "    %s", output_path);
+		}
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..1f3d90d3bf 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -81,7 +84,11 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for extensions and logical replication output plugins.
+	 */
+	os_info.libraries = pg_malloc_array(LibraryInfo,
+										totaltups + count_old_cluster_logical_slots());
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +96,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,21 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
+
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..df1149a3d6 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,18 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * If we are reading the old_cluster, gets infos for logical
+		 * replication slots.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -394,7 +407,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -600,6 +613,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are expressly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_is_lost;
+
+		slotinfos = pg_malloc_array(LogicalSlotInfo, num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_is_lost = PQfnumber(res, "conflicting");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->is_lost = (strcmp(PQgetvalue(res, slotnum, i_is_lost), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -642,8 +762,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +783,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..7c995ff58e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,10 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	create_script_for_old_cluster_deletion(&deletion_script_file_name);
+
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -199,10 +205,6 @@ main(int argc, char **argv)
 		check_ok();
 	}
 
-	create_script_for_old_cluster_deletion(&deletion_script_file_name);
-
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after doing the pg_resetwal command because
+	 * pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..54a5f66424 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		is_lost;		/* Is the slot in 'lost'? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
@@ -472,3 +494,5 @@ void		parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
 										  char *old_pgdata, char *new_pgdata,
 										  char *old_tablespace);
 bool		reap_child(bool wait_for_child);
+
+XLogRecPtr	strtoLSN(const char *str, bool *have_error);
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..a7c4dfb005 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,6 +234,10 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * As for the old cluster, the max_slot_wal_keep_size is set to -1 to
+	 * prevent the WAL removal required by logical slots. The setting could
+	 * avoid the invalidation of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
@@ -241,7 +245,8 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" :
+			 " -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..640964c4e1
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f2af84d7ca..98c01fa05f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#206

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#205)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Thursday, September 7, 2023 8:24 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

Thank you for reviewing! PSA new version.

Thanks for updating the patches !

Here are some comments:

 bool		reap_child(bool wait_for_child);
+
+XLogRecPtr	strtoLSN(const char *str, bool *have_error);

This function has be removed.

+	if (nslots_on_new)
+	{
+		if (nslots_on_new == 1)
+			pg_fatal("New cluster must not have logical replication slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication slots but found %d slots.",
+					 nslots_on_new);

We could try ngettext() here:
pg_log_warning(ngettext("New cluster must not have logical replication slots but found %d slot.",
"New cluster must not have logical replication slots but found %d slots",
nslots_on_new)

3.
- create_script_for_old_cluster_deletion(&deletion_script_file_name);
-

Is there a reason for reordering this function ? Sorry If I missed some
previous discussions.

@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);

I think the assert is not necessary, as the patch will check the new cluster's
slots in another function. Besides, this function is not only used for new
cluster, but the comment only mentioned the new cluster which seems a bit
inconsistent. So, how about removing it ?

5.
 			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" :
+			 " -c max_slot_wal_keep_size=-1",

I think we need to set max_slot_wal_keep_size on new cluster as well, otherwise
it's possible that the new created slots get invalidated during upgrade, what
do you think ?

+	bool		is_lost;		/* Is the slot in 'lost'? */
+} LogicalSlotInfo;

Would it be better to use 'invalidated', as the same is used in error message
of ReportSlotInvalidation() and logicaldecoding.sgml.

7.
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
	...
+		if (script)
+		{
+			fclose(script);
+
+			pg_log(PG_REPORT, "fatal");
+			pg_fatal("The source cluster contains one or more problematic logical replication slots.\n"

I think we should do this pg_fatal out of the for() loop, otherwise we cannot
collect all the problematic slots.

Best Regards,
Hou zj

#207

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#205)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 7, 2023 at 5:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.

Few comments:
=============
1.
<para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>

Shall we refer to conflicting flag here instead of wal_status?

2.
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@

#include "postgres_fe.h"

+#include "access/xlogdefs.h"

This include doesn't seem to be required as we already include this
file via pg_upgrade.h.

3.
+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);

wal_level should be checked before the number of slots required.

4.
@@ -81,7 +84,11 @@ get_loadable_libraries(void)
{
...
+ totaltups++;
+ }
+
  }

Spurious new line in the above code.

5.
- os_info.libraries = (LibraryInfo *) pg_malloc(totaltups *
sizeof(LibraryInfo));
+ /*
+ * Allocate memory for extensions and logical replication output plugins.
+ */
+ os_info.libraries = pg_malloc_array(LibraryInfo,

We haven't referred to extensions previously in this function, so how
about changing the comment to: "Allocate memory for required libraries
and logical replication output plugins."?

6.
+ /*
+ * If we are reading the old_cluster, gets infos for logical
+ * replication slots.
+ */

How about changing the comment to: "Retrieve the logical replication
slots infos for the old cluster."?

7.
+ /*
+ * The temporary slots are expressly ignored while checking because such
+ * slots cannot exist after the upgrade. During the upgrade, clusters are
+ * started and stopped several times causing any temporary slots to be
+ * removed.
+ */

/expressly/explicitly

8.
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.

I think it would be better to just say: "Returns the number of logical
replication slots for all databases."

9.
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ create_logical_replication_slots();

We can slightly change the Note to: "This must be done after executing
pg_resetwal command in the caller because pg_resetwal would remove
required WALs."

--
With Regards,
Amit Kapila.

#208

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#206)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 8, 2023 at 2:12 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

2.
+       if (nslots_on_new)
+       {
+               if (nslots_on_new == 1)
+                       pg_fatal("New cluster must not have logical replication slots but found a slot.");
+               else
+                       pg_fatal("New cluster must not have logical replication slots but found %d slots.",
+                                        nslots_on_new);
We could try ngettext() here:
pg_log_warning(ngettext("New cluster must not have logical replication slots but found %d slot.",
"New cluster must not have logical replication slots but found %d slots",
nslots_on_new)

Will using pg_log_warning suffice for the purpose of exiting the
upgrade process? I don't think the intention here is to continue after
finding such a case.

4.
@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
+
+               /*
+                * Logical replication slots must not exist on the new cluster before
+                * create_logical_replication_slots().
+                */
+               Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
I think the assert is not necessary, as the patch will check the new cluster's
slots in another function. Besides, this function is not only used for new
cluster, but the comment only mentioned the new cluster which seems a bit
inconsistent. So, how about removing it ?

Yeah, I also find it odd.

5.
(cluster == &new_cluster) ?
-                        " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+                        " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" :
+                        " -c max_slot_wal_keep_size=-1",
I think we need to set max_slot_wal_keep_size on new cluster as well, otherwise
it's possible that the new created slots get invalidated during upgrade, what
do you think ?

I also think that would be better.

6.
+       bool            is_lost;                /* Is the slot in 'lost'? */
+} LogicalSlotInfo;
Would it be better to use 'invalidated',

Or how about simply 'invalid'?

A few other points:
1.
  ntups = PQntuples(res);
- dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+ dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);

Can we write a comment to say why we need zero memory here?

2. Why get_old_cluster_logical_slot_infos() need to use
pg_malloc_array whereas for similar stuff get_rel_infos() use
pg_malloc()?

--
With Regards,
Amit Kapila.

#209

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#206)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thank you for reviewing! PSA new version! PSA new version.

Here are some comments:

1.
bool		reap_child(bool wait_for_child);
+
+XLogRecPtr	strtoLSN(const char *str, bool *have_error);
This function has be removed.

Removed.

2.
+	if (nslots_on_new)
+	{
+		if (nslots_on_new == 1)
+			pg_fatal("New cluster must not have logical replication
slots but found a slot.");
+		else
+			pg_fatal("New cluster must not have logical replication
slots but found %d slots.",
+					 nslots_on_new);
We could try ngettext() here:
pg_log_warning(ngettext("New cluster must not have logical
replication slots but found %d slot.",
"New
cluster must not have logical replication slots but found %d slots",

nslots_on_new)

I agreed to use ngettext(), but I disagreed to change to warning.
Changed to use ngettext().

3.
- create_script_for_old_cluster_deletion(&deletion_script_file_name);
-

Is there a reason for reordering this function ? Sorry If I missed some
previous discussions.

We discussed to move create_logical_replication_slots(), but not for
create_script_for_old_cluster_deletion(). Restored.

4.
@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
+
+		/*
+		 * Logical replication slots must not exist on the new cluster
before
+		 * create_logical_replication_slots().
+		 */
+		Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
I think the assert is not necessary, as the patch will check the new cluster's
slots in another function. Besides, this function is not only used for new
cluster, but the comment only mentioned the new cluster which seems a bit
inconsistent. So, how about removing it ?

Amit also pointed out, so removed the Assertion and comment.

5.
(cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c
full_page_writes=off" : "",
+			 " -c synchronous_commit=off -c fsync=off -c
full_page_writes=off" :
+			 " -c max_slot_wal_keep_size=-1",
I think we need to set max_slot_wal_keep_size on new cluster as well, otherwise
it's possible that the new created slots get invalidated during upgrade, what
do you think ?

Added.

6.
+	bool		is_lost;		/* Is the slot in 'lost'? */
+} LogicalSlotInfo;
Would it be better to use 'invalidated', as the same is used in error message
of ReportSlotInvalidation() and logicaldecoding.sgml.

Per suggestion from Amit, changed to 'invalid'.

7.
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
...
+		if (script)
+		{
+			fclose(script);
+
+			pg_log(PG_REPORT, "fatal");
+			pg_fatal("The source cluster contains one or more
problematic logical replication slots.\n"
I think we should do this pg_fatal out of the for() loop, otherwise we cannot
collect all the problematic slots.

Yeah, agreed. Fixed.

Also, based on the discussion [1]/messages/by-id/CAA4eK1+WBphnmvMpjrxceymzuoMuyV2_pMGaJq-zNODiJqAa7Q@mail.gmail.com, I added an elog(ERROR) in InvalidatePossiblyObsoleteSlot().

[1]: /messages/by-id/CAA4eK1+WBphnmvMpjrxceymzuoMuyV2_pMGaJq-zNODiJqAa7Q@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v34-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v34-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From b2c36993807e70e05000c54777540b93bea5ebf7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v34 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Dilip Kumar, Ashutosh Bapat, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  38 ++++++-
 src/include/replication/slot.h                |   8 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 102 ++++++++++++++++++
 5 files changed, 145 insertions(+), 6 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..5a6d376cfa 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1572,11 +1573,14 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * We can flush dirty replication slots at regular intervals by any
+ * background process like bgwriter but checkpoint is a convenient location.
+ * Additionally, in case of a shutdown checkpoint, we also identify the
+ * slots for which confirmed_flush LSN has been updated since the last time
+ * it persisted and flush them.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1605,30 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes. However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose confirmed_flush LSN is changed since we
+		 * last saved the slot to disk. This will help in avoiding retreat of
+		 * the confirmed_flush LSN after restart. At other times, the
+		 * walsender keeps saving the slot from time to time as the
+		 * replication progresses, so there is no clear advantage of flushing
+		 * additional slots at the time of checkpoint.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(&s->mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+				s->data.confirmed_flush != s->last_saved_confirmed_flush)
+			{
+				s->just_dirtied = true;
+				s->dirty = true;
+			}
+			SpinLockRelease(&s->mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1901,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2103,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..4c4fe10e57 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,12 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * This is used to track the last persisted confirmed_flush LSN value to
+	 * detect any divergence in the in-memory and on-disk values for the same.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +247,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..224a840a61
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+autovacuum = off
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v34-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v34-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 390d874d5af6f4ff47208650a75e6e9dca2055b6 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v34 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  70 +++++-
 src/backend/replication/slot.c                |   8 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 203 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  28 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |   7 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 818 insertions(+), 39 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..9668aaa7a9 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,71 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +717,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 5a6d376cfa..e0e7f53e28 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,14 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * Raise an ERROR if the logical replication slot is invalidating. It
+		 * would not happen because max_slot_wal_keep_size is set to -1 during
+		 * the upgrade, but it stays safe.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b9ea906823 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,165 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots.");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal(ngettext("New cluster must not have logical replication slots but found %d slot.",
+						  "New cluster must not have logical replication slots but found %d slots.",
+						  nslots_on_new),
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level.");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots.");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "problematic_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot still usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot is unusable\n",
+						slot->slotname);
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("The source cluster contains one or more problematic logical replication slots.\n"
+				 "The needed workaround depends on the problem.\n"
+				 "1) If the problem is \"The slot is unusable,\" You can drop such replication slots.\n"
+				 "2) If the problem is \"The slot has not consumed WALs yet,\" you can consume all remaining WALs.\n"
+				 "Then, you can restart the upgrade.\n"
+				 "A list of problematic logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..f31d5a8fe1 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	os_info.libraries = pg_malloc_array(LibraryInfo,
+										totaltups + count_old_cluster_logical_slots());
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,20 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..15fa8d5c01 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = pg_malloc_array(LogicalSlotInfo, num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "conflicting");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..7ae37cc458 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* Is the slot usable? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..b591b36efe 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,6 +234,10 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
+	 * required by logical slots. The setting could avoid the invalidation of
+	 * slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
@@ -241,7 +245,8 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c max_slot_wal_keep_size=-1 " :
+			 " -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..640964c4e1
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f2af84d7ca..98c01fa05f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#210

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#207)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing!

Few comments:
=============
1.
<para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>
wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>

Shall we refer to conflicting flag here instead of wal_status?

Changed. I used the word 'lost' in check_old_cluster_for_valid_slots() because of
the line, so changed them accordingly.

2.
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -9,6 +9,7 @@
#include "postgres_fe.h"

+#include "access/xlogdefs.h"

This include doesn't seem to be required as we already include this
file via pg_upgrade.h.

I preferred to include explicitly... but fixed.

3.
+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);

wal_level should be checked before the number of slots required.

Moved.

4.
@@ -81,7 +84,11 @@ get_loadable_libraries(void)
{
...
+ totaltups++;
+ }
+
}
Spurious new line in the above code.

Removed.

5.
- os_info.libraries = (LibraryInfo *) pg_malloc(totaltups *
sizeof(LibraryInfo));
+ /*
+ * Allocate memory for extensions and logical replication output plugins.
+ */
+ os_info.libraries = pg_malloc_array(LibraryInfo,
We haven't referred to extensions previously in this function, so how
about changing the comment to: "Allocate memory for required libraries
and logical replication output plugins."?

Changed.

6.
+ /*
+ * If we are reading the old_cluster, gets infos for logical
+ * replication slots.
+ */
How about changing the comment to: "Retrieve the logical replication
slots infos for the old cluster."?

Changed.

7.
+ /*
+ * The temporary slots are expressly ignored while checking because such
+ * slots cannot exist after the upgrade. During the upgrade, clusters are
+ * started and stopped several times causing any temporary slots to be
+ * removed.
+ */

/expressly/explicitly

Replaced.

8.
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
I think it would be better to just say: "Returns the number of logical
replication slots for all databases."

Changed.

9.
+ * Note: This must be done after doing the pg_resetwal command because
+ * pg_resetwal would remove required WALs.
+ */
+ if (count_old_cluster_logical_slots())
+ create_logical_replication_slots();
We can slightly change the Note to: "This must be done after executing
pg_resetwal command in the caller because pg_resetwal would remove
required WALs."

Reworded.

You can see the new version in [1]/messages/by-id/TYAPR01MB5866AB60B4CF404419D9373DF5EDA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

[1]: /messages/by-id/TYAPR01MB5866AB60B4CF404419D9373DF5EDA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#211

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#208)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

On Fri, Sep 8, 2023 at 2:12 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
2.
+       if (nslots_on_new)
+       {
+               if (nslots_on_new == 1)
+                       pg_fatal("New cluster must not have logical
replication slots but found a slot.");
+               else
+                       pg_fatal("New cluster must not have logical
replication slots but found %d slots.",

+ nslots_on_new);

We could try ngettext() here:
pg_log_warning(ngettext("New cluster must not have logical

replication slots but found %d slot.",

"New

cluster must not have logical replication slots but found %d slots",

nslots_on_new)

Will using pg_log_warning suffice for the purpose of exiting the
upgrade process? I don't think the intention here is to continue after
finding such a case.

I also think that pg_log_warning is not good.

4.
@@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
{
free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
pg_free(db_arr->dbs[dbnum].db_name);
+
+               /*
+                * Logical replication slots must not exist on the new cluster
before
+                * create_logical_replication_slots().
+                */
+               Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
I think the assert is not necessary, as the patch will check the new cluster's
slots in another function. Besides, this function is not only used for new
cluster, but the comment only mentioned the new cluster which seems a bit
inconsistent. So, how about removing it ?
Yeah, I also find it odd.

Removed. Based on the decision, your new comment 1 is not needed anymore.

5.
(cluster == &new_cluster) ?
- " -c synchronous_commit=off -c fsync=off -c

full_page_writes=off" : "",

+ " -c synchronous_commit=off -c fsync=off -c

full_page_writes=off" :

+ " -c max_slot_wal_keep_size=-1",

I think we need to set max_slot_wal_keep_size on new cluster as well,

otherwise

it's possible that the new created slots get invalidated during upgrade, what
do you think ?

I also think that would be better.

Added.

6.
+       bool            is_lost;                /* Is the slot in 'lost'? */
+} LogicalSlotInfo;
Would it be better to use 'invalidated',
Or how about simply 'invalid'?

Used the word invalid.

A few other points:
1.
ntups = PQntuples(res);
- dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+ dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);

Can we write a comment to say why we need zero memory here?

Reverted the change. Originally it was needed to pass the Assert()
in the free_db_and_rel_infos(), but it was removed per above.

2. Why get_old_cluster_logical_slot_infos() need to use
pg_malloc_array whereas for similar stuff get_rel_infos() use
pg_malloc()?

They did a same thing. I used pg_malloc_array() macro to keep the code
within 80 columns.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#212

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#211)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 8, 2023 at 6:36 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

2. Why get_old_cluster_logical_slot_infos() need to use
pg_malloc_array whereas for similar stuff get_rel_infos() use
pg_malloc()?

They did a same thing. I used pg_malloc_array() macro to keep the code
within 80 columns.

I think it is better to be consistent with the existing code in this
case. Also, see, if the usage in get_loadable_libraries() can also be
changed back to use pg_malloc().

--
With Regards,
Amit Kapila.

#213

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#209)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Comments on the latest patch.

1.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

This comment in the commit message is confusing. I understand the
reason but from this, it is not very clear that if resetwal removes
the WAL we needed then why it is good to create after the resetwal. I
think we should make it clear that creating new slot will set the
restart lsn to current WAL location and after that resetwal can remove
those WAL where slot restart lsn is pointing....

+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>

I think we should also add that the new slot should not have any
permanent existing logical replication slot.

3.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.

This paragraph should be rephrased. I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way. Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

Rephrase the first line as -> Raise an ERROR if the logical
replication slot is invalidating during an upgrade.

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;

For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.

6.
+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+   "New cluster must not have logical replication slots but found %d slots.",
+   nslots_on_new),
+ nslots_on_new);
...
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);

I have noticed that the case of the first letter in the pg_fatal
message is not consistent.

7.
+
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {

Why comment says "Is the slot still usable?" I think it should be "Is
the slot usable?" otherwise it appears that we have first fetched the
slots and now we are refetching it and checking whether it is still
usable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#214

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#213)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 11, 2023 at 10:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

3.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
This paragraph should be rephrased. I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way. Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."

It is fine to combine these sentences but let's be a bit more
explicit: "Only logical slots on the primary are migrated to the new
standby, and other slots on the old standby must be recreated as they
are not copied."

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

Rephrase the first line as -> Raise an ERROR if the logical
replication slot is invalidating during an upgrade.

I think it would be better to write something like: "The logical
replication slots shouldn't be invalidated as max_slot_wal_keep_size
is set to -1 during the upgrade."

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.

But the current check is consistent with what we do at other places
during the upgrade. I think the patch is trying to be consistent with
existing code as much as possible.

--
With Regards,
Amit Kapila.

#215

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#214)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 11, 2023 at 11:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 11, 2023 at 10:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
3.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
This paragraph should be rephrased. I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way. Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."
It is fine to combine these sentences but let's be a bit more
explicit: "Only logical slots on the primary are migrated to the new
standby, and other slots on the old standby must be recreated as they
are not copied."

Fine with this.

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
Rephrase the first line as -> Raise an ERROR if the logical
replication slot is invalidating during an upgrade.
I think it would be better to write something like: "The logical
replication slots shouldn't be invalidated as max_slot_wal_keep_size
is set to -1 during the upgrade."

This makes it much clear.

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.
But the current check is consistent with what we do at other places
during the upgrade. I think the patch is trying to be consistent with
existing code as much as possible.

Okay, I see. Thanks for pointing that out.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#216

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#209)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version! PSA new version.

Few comments:
==============
1.
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.

We can add something like: "This ensures that all the data has been
replicated before the upgrade." to make it clear why this test is
important.

2. Move the wal_level related restriction before max_replication_slots.

3.
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
+ }
+
+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all
+ * the slots is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been
+ * shut down, so we skip this for live checks.
+ */
+ if (!live_check && !slot->caught_up)

Isn't it better to continue for the next slot once we find that slot
is invalid instead of checking other conditions?

4.
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);

Let's keep it as one string and change the message to: "The slot
"\"%s\" is invalid"

+ fprintf(script,
+ "slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+ slot->slotname);
+ }

On a similar line, we can change this to: "The slot "\"%s\" has not
consumed the WAL yet"

5.
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "problematic_logical_relication_slots.txt");

I think we can name this file as "invalid_logical_replication_slots"
or simply "logical_replication_slots"

6.
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ "    %s", output_path);

This doesn't match the similar existing comments. So, let's change it
to something like:

"Your installation contains invalid logical replication slots. These
slots can't be copied so this cluster cannot currently be upgraded.
Consider either removing such slots or consuming the pending WAL if
any and then restart the upgrade. A list of invalid logical
replication slots is in the file:"

Apart from the above, I have edited a few other comments in the patch.
See attached.

--
With Regards,
Amit Kapila.

Attachments:

cosmetic_improvements_amit.1.patchapplication/octet-stream; name=cosmetic_improvements_amit.1.patchDownload

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index e0e7f53e28..4a800dd4bc 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1424,9 +1424,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 		SpinLockRelease(&s->mutex);
 
 		/*
-		 * Raise an ERROR if the logical replication slot is invalidating. It
-		 * would not happen because max_slot_wal_keep_size is set to -1 during
-		 * the upgrade, but it stays safe.
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size is set to -1 during the upgrade.
 		 */
 		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
 			elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index b591b36efe..d083a001f1 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -235,9 +235,9 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 *
-	 * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
-	 * required by logical slots. The setting could avoid the invalidation of
-	 * slots during the upgrade.
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal that is
+	 * required by logical slots.  This would avoid the invalidation of slots
+	 * during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",

#217

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Dilip Kumar (#213)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Dilip,

Thank you for reviewing! PSA new version.

1.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

This comment in the commit message is confusing. I understand the
reason but from this, it is not very clear that if resetwal removes
the WAL we needed then why it is good to create after the resetwal. I
think we should make it clear that creating new slot will set the
restart lsn to current WAL location and after that resetwal can remove
those WAL where slot restart lsn is pointing....

Just to confirm - WAL records must not be removed in any time if it is referred
as restart_lsn. The reason why the slot creation is done after pg_restwal is that
required WALs are not removed by the command. See [1]/messages/by-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com.
Moreover, clarified more in the commit message.

+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>
wal_status</structfield>
+       is <literal>lost</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>c
onfirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link
linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
me></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link
linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>

I think we should also add that the new slot should not have any
permanent existing logical replication slot.

Hmm, I wondered it should be really needed. Tables are required not to be in the
new cluster too, but not documented. It might be a trivial thing. Anyway, added.

FYI - the restriction was not introduced by the patch. I reported independently [2]/messages/by-id/TYAPR01MB5866D277F6BEDEA4223B3559F5E6A@TYAPR01MB5866.jpnprd01.prod.outlook.com,
but no one has responded since now...

3.
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Replication slots on the old standby are not copied.
+       Only logical slots on the primary are migrated to the new standby,
+       and other slots must be recreated.
This paragraph should be rephrased. I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way. Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."

Per discussion on [3]/messages/by-id/CAFiTN-vs53SqZiZN1GcSuKLmMY=0d14wJDDm1aKmoBONwnqaGg@mail.gmail.com, I used another words. Thanks for suggesting.

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

Rephrase the first line as -> Raise an ERROR if the logical
replication slot is invalidating during an upgrade.

Per discussion on [3]/messages/by-id/CAFiTN-vs53SqZiZN1GcSuKLmMY=0d14wJDDm1aKmoBONwnqaGg@mail.gmail.com, I used another words. Thanks for suggesting.

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.

Per discussion on [3]/messages/by-id/CAFiTN-vs53SqZiZN1GcSuKLmMY=0d14wJDDm1aKmoBONwnqaGg@mail.gmail.com, I did not change here.

6.
+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+   "New cluster must not have logical replication slots but found %d slots.",
+   nslots_on_new),
+ nslots_on_new);
...
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);

I have noticed that the case of the first letter in the pg_fatal
message is not consistent.

Actually there are some inconsistency even in the check.c file, so I devised
below rules. How do you think?

* Non-complete sentence starts with the lower case.
(e.g., "could not open", "could not determine")
* proper nouns are always noted with the lower cases
(e.g., "template0 must not allow...", "wal_level must be...").
* Other than above, the sentence starts with the upper case.

7.
+
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
Why comment says "Is the slot still usable?" I think it should be "Is
the slot usable?" otherwise it appears that we have first fetched the
slots and now we are refetching it and checking whether it is still
usable.

Changed.

[1]: /messages/by-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYAPR01MB5866D277F6BEDEA4223B3559F5E6A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: /messages/by-id/CAFiTN-vs53SqZiZN1GcSuKLmMY=0d14wJDDm1aKmoBONwnqaGg@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v35-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchapplication/octet-stream; name=v35-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patchDownload

From 050b79806ca182f74f6a4248fdb13d9bf7b5cf24 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v35 1/2] Persist logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Dilip Kumar, Ashutosh Bapat, Michael Paquier, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  38 ++++++-
 src/include/replication/slot.h                |   9 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 102 ++++++++++++++++++
 5 files changed, 146 insertions(+), 6 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..63b80d321c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1572,11 +1573,14 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * We can flush dirty replication slots at regular intervals by any
+ * background process but checkpoint is a convenient location. Additionally,
+ * in case of a shutdown checkpoint, we also identify the slots for which
+ * confirmed_flush LSN has been updated since the last time it persisted and
+ * flush them.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1605,30 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes. However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose confirmed_flush LSN is changed since we
+		 * last saved the slot to disk. This will help in avoiding retreat of
+		 * the confirmed_flush LSN after restart. At other times, the
+		 * walsender keeps saving the slot from time to time as the
+		 * replication progresses, so there is no clear advantage of flushing
+		 * additional slots at the time of checkpoint.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(&s->mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+				s->data.confirmed_flush != s->last_saved_confirmed_flush)
+			{
+				s->just_dirtied = true;
+				s->dirty = true;
+			}
+			SpinLockRelease(&s->mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1901,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2103,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..758ca79a81 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,13 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * This value tracks the last confirmed_flush LSN flushed which is used
+	 * during a shutdown checkpoint to decide if logical's slot data should be
+	 * forcibly flushed or not.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +248,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..224a840a61
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data      = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+autovacuum = off
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v35-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v35-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From c69cd70431c5a349c762da2d5fcabee93419f8e8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v35 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required as
restart_lsn. If WALs required by logical replication slots are removed, they are
unusable. Therefore, during the upgrade, slot restoration is done after the final
pg_resetwal command. The workflow ensures that required WALs are remained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++++-
 src/backend/replication/slot.c                |   7 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 204 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  29 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |   7 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 827 insertions(+), 39 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..0752efcd75 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location. This ensures that all the data has been replicated
+       before the upgrade.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there are no slots whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +725,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are migrated to the
+       new standby, and other slots on the old standby must be recreated as
+       they are not copied.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 63b80d321c..2bdaeb93b2 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,13 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size is set to -1 during the upgrade.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..698f9052bf 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,166 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal(ngettext("New cluster must not have logical replication slots but found %d slot.",
+						  "New cluster must not have logical replication slots but found %d slots.",
+						  nslots_on_new),
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster.",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..6a7c0a2733 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
 /*
@@ -46,7 +47,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +58,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			array_size;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +85,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	array_size = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * (array_size));
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +98,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +112,20 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..3829c3c355 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "conflicting");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..7ae37cc458 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* Is the slot usable? */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d083a001f1 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,6 +234,10 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal that is
+	 * required by logical slots.  This would avoid the invalidation of slots
+	 * during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
@@ -241,7 +245,8 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
 			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c max_slot_wal_keep_size=-1 " :
+			 " -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..640964c4e1
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f2af84d7ca..98c01fa05f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#218

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#212)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for giving a suggestion!

2. Why get_old_cluster_logical_slot_infos() need to use
pg_malloc_array whereas for similar stuff get_rel_infos() use
pg_malloc()?

They did a same thing. I used pg_malloc_array() macro to keep the code
within 80 columns.

I think it is better to be consistent with the existing code in this
case. Also, see, if the usage in get_loadable_libraries() can also be
changed back to use pg_malloc().

Fixed as you said. The line becomes too long, so a variable was newly introduced.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#219

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#216)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing!

Few comments:
==============
1.
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>c
onfirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
We can add something like: "This ensures that all the data has been
replicated before the upgrade." to make it clear why this test is
important.

Added.

2. Move the wal_level related restriction before max_replication_slots.

3.
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
+ }
+
+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all
+ * the slots is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been
+ * shut down, so we skip this for live checks.
+ */
+ if (!live_check && !slot->caught_up)

Isn't it better to continue for the next slot once we find that slot
is invalid instead of checking other conditions?

Right, fixed.

4.
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
Let's keep it as one string and change the message to: "The slot
"\"%s\" is invalid"

Changed.

+ fprintf(script,
+ "slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+ slot->slotname);
+ }
On a similar line, we can change this to: "The slot "\"%s\" has not
consumed the WAL yet"

Changed.

5.
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "problematic_logical_relication_slots.txt");
I think we can name this file as "invalid_logical_replication_slots"
or simply "logical_replication_slots"

The latter one seems too general for me, "invalid_..." was chosen.

6.
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ "    %s", output_path);
This doesn't match the similar existing comments. So, let's change it
to something like:

"Your installation contains invalid logical replication slots. These
slots can't be copied so this cluster cannot currently be upgraded.
Consider either removing such slots or consuming the pending WAL if
any and then restart the upgrade. A list of invalid logical
replication slots is in the file:"

Basically changed to your suggestion, but slightly reworded based on
what Grammarly said.

Apart from the above, I have edited a few other comments in the patch.
See attached.

Thanks for attaching! Included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#220

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#209)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san, here are my review comments for v34-0002

There is likely to be some overlap because others have modified and/or
commented on some of the same points as me, and v35 was already posted
before this review. I'll leave it to you to sort out any clashes and
ignore them where appropriate.

======
1. GENERAL -- Cluster Terminology

This is not really a problem of your patch, but during message review,
I noticed the terms old/new cluster VERSUS source/target cluster and
both were used many times:

For example.
".*new clusmter --> 44 occurences
".*old cluster --> 21 occurences
".*source cluster --> 6 occurences
".*target cluster --> 12 occurences

Perhaps there should be a new thread/patch to use consistent terms.

Thoughts?

~~~

2. GENERAL - Error message cases

Just FYI, there are many inconsistent capitalising in these patch
messages, but then the same is also true for the HEAD code. It's a bit
messy, but generally, I think your capitalisation was aligned with
what I saw in HEAD, so I didn't comment anywhere about it.

======
src/backend/replication/slot.c

3. InvalidatePossiblyObsoleteSlot

+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

3a.
That comment didn't seem good. I think you mean like in the suggestion below.

SUGGESTION
It should not be possible for logical replication slots to be
invalidated because max_slot_wal_keep_size is set to -1 during the
upgrade. The following is just for sanity-checking.

3b.
I wasn't sure if 'max_slot_wal_keep_size' GUC is accessible in this
scope, but if it is available then maybe
Assert(max_slot_wal_keep_size_mb == -1); should also be included in
this sanity check.

======
src/bin/pg_upgrade/check.c

4. check_new_cluster_logical_replication_slots

+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");

There is some inconsistency with all the subsequent pg_fatals within
this function -- some of them mention "New cluster" but most of them
do not.

Meanwhile, Kuroda-san showed me sample output like:

Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok
Checking for logical replication slots
New cluster must not have logical replication slots but found 1 slot.
Failure, exiting

So, I felt the log message title ("Checking...") should be changed to
include the words "new cluster" just like the log preceding it:

"Checking for logical replication slots" ==> "Checking for new cluster
logical replication slots"

Now all the subsequent pg_fatals clearly are for "new cluster"

5. check_new_cluster_logical_replication_slots

+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+   "New cluster must not have logical replication slots but found %d slots.",
+   nslots_on_new),
+ nslots_on_new);

5a.
TBH, I didn't see why you go to unnecessary trouble to have a plural
message here. The message could just be like:
"New cluster must have 0 logical replication slots but found %d."

5b.
However, now (from the previous review comment #4) if "New cluster" is
already explicit in the log, the pg_fatal message can become just:
"New cluster must have ..." ==> "Expected 0 logical replication slots
but found %d."

~~~

6. check_old_cluster_for_valid_slots

+ if (script)
+ {
+ fclose(script);
+
+ pg_log(PG_REPORT, "fatal");
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ "    %s", output_path);
+ }

This needs fixing but I saw it has been updated in v35, so I'll check
it there later.

======
src/bin/pg_upgrade/info.c

7. get_db_rel_and_slot_infos

void
get_db_rel_and_slot_infos(ClusterInfo *cluster)
{
int dbnum;

if (cluster->dbarr.dbs != NULL)
free_db_and_rel_infos(&cluster->dbarr);

Judging from the HEAD code this function was intended to be reentrant
-- e.g. it does cleanup code free_db_and_rel_infos in case there was
something there from before.

IIUC there is no such cleanup for the slot_arr. I forget why this was
removed. Sure, you might be able to survive the memory leaks, but
choosing NOT to clean up the slot_arr seems to contradict the
intention of HEAD calling free_db_and_rel_infos.

~~~

8. get_db_infos

I noticed the pg_malloc0 is reverted in this function.

- dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+ dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);

IMO it is better to do pg_malloc0 here.

Sure, everything probably works OK for the current code, but it seems
unnecessarily risky to assume that functions will forever be called in
a specific order. AFAICT if someone (e.g. for debugging) calls
count_old_cluster_logical_slots() or calls print_slot_infos() then the
behaviour is undefined because slot_arr.nslots remains uninitialized.

~~~

9. get_old_cluster_logical_slot_infos

+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+ i_caught_up = PQfnumber(res, "caught_up");
+ i_invalid = PQfnumber(res, "conflicting");

IMO SQL should be using an alias for this column, so you can say:
i_invalid = PQfnumber(res, "invalid")

which seems better than switching the wording in code.

======
src/bin/pg_upgrade/pg_upgrade.h

10. LogicalSlotInfo

+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* can the slot decode 2PC? */
+ bool caught_up; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */
+ bool invalid; /* Is the slot usable? */
+} LogicalSlotInfo;

+ bool invalid; /* Is the slot usable? */
This field name and comment have opposite meanings. Invalid means NOT usable.

SUGGESTION
/* If true, the slot is unusable. */

======
src/bin/pg_upgrade/server.c

11. start_postmaster

  * we only modify the new cluster, so only use it there.  If there is a
  * crash, the new cluster has to be recreated anyway.  fsync=off is a big
  * win on ext4.
+ *
+ * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
+ * required by logical slots. The setting could avoid the invalidation of
+ * slots during the upgrade.
  */
~

IMO this comment "to prevent the WAL removal required by logical
slots" is ambiguous about how it could be interpreted. Needs
rearranging for clarity.

12. start_postmaster

  (cluster == &new_cluster) ?
- " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+ " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c
max_slot_wal_keep_size=-1 " :
+ " -c max_slot_wal_keep_size=-1",

Instead of putting the same option on both sides of the ternary, I was
wondering if it might be better to hardwire the max_slot_wal_keep_size
just 1 time in the format string?

======
.../pg_upgrade/t/003_logical_replication_slots.pl

13.
# Remove the remained slot

/remained/remaining/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#221

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#217)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san.

Here are some additional review comments for v35-0002 (and because we
overlapped, my v34-0002 review comments have not been addressed yet)

======
Commit message

1.
Note that the pg_resetwal command would remove WAL files, which are required as
restart_lsn. If WALs required by logical replication slots are removed, they are
unusable. Therefore, during the upgrade, slot restoration is done
after the final
pg_resetwal command. The workflow ensures that required WALs are remained.

SUGGESTION (minor wording and /required as/required for/ and
/remained/retained/)
Note that the pg_resetwal command would remove WAL files, which are
required for restart_lsn. If WALs required by logical replication
slots are removed, the slots are unusable. Therefore, during the
upgrade, slot restoration is done after the final pg_resetwal command.
The workflow ensures that required WALs are retained.

======
doc/src/sgml/ref/pgupgrade.sgml

2.
The SGML is mal-formed so I am unable to build PG DOCS. Please try
building the docs before posting the patch.

ref/pgupgrade.sgml:446: parser error : Opening and ending tag
mismatch: itemizedlist line 410 and listitem
</listitem>
^

~~~

3.
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there are no slots whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>

/there are no slots whose/there must be no slots where/

~~~

4.
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are migrated to the
+       new standby, and other slots on the old standby must be recreated as
+       they are not copied.
       </para>

Mixing the terms "migrated" and "copied" seems to complicate this.
Does the following suggestion work better instead?

SUGGESTION (??)
Only logical slots on the primary are migrated to the new standby. Any
other slots present on the old standby must be recreated.

======
src/backend/replication/slot.c

5. InvalidatePossiblyObsoleteSlot

+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size is set to -1 during the upgrade.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
+

I felt the comment could have another sentence like "The following is
just a sanity check."

======
src/bin/pg_upgrade/function.c

6. get_loadable_libraries

+ array_size = totaltups + count_old_cluster_logical_slots();
+ os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
(array_size));
  totaltups = 0;

6a.
Maybe something like 'n_libinfos' would be a more meaningful name than
'array_size'?

6b.
+ os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
(array_size));

Those extra parentheses around "(array_size)" seem overkill.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#222

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#217)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Monday, September 11, 2023 9:22 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.

Thanks for updating the patch, few cosmetic comments:

#include "access/transam.h"
#include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
#include "pg_upgrade.h"

It seems we don't need this head file anymore.

2.
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

I think normally the first letter is lowercase, and we can avoid the period.

Best Regards,
Hou zj

#223

Michael Paquier

michael@paquier.xyz

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#222)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Sep 12, 2023 at 02:33:25AM +0000, Zhijie Hou (Fujitsu) wrote:

2.
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
I think normally the first letter is lowercase, and we can avoid the period.

Documentation is your friend:
https://www.postgresql.org/docs/current/error-style-guide.html
--
Michael

#224

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#220)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! Before posting new patch set, I want to respond some
comments.

======
1. GENERAL -- Cluster Terminology

This is not really a problem of your patch, but during message review,
I noticed the terms old/new cluster VERSUS source/target cluster and
both were used many times:

For example.
".*new clusmter --> 44 occurences
".*old cluster --> 21 occurences
".*source cluster --> 6 occurences
".*target cluster --> 12 occurences

Perhaps there should be a new thread/patch to use consistent terms.

Thoughts?

I preferred the term new/old because I could not found the term source/target
in the documentation for the pg_upgrade. (IIUC I used new/old in my patch).
Anyway, it should be discussed in another thread.

2. GENERAL - Error message cases

Just FYI, there are many inconsistent capitalising in these patch
messages, but then the same is also true for the HEAD code. It's a bit
messy, but generally, I think your capitalisation was aligned with
what I saw in HEAD, so I didn't comment anywhere about it.

Yeah, the rule is broken even in HEAD. I determined a rule in [1]/messages/by-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com, which seems
consistent with other parts in the file.
Michael kindly told the error message formatting [2]https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION, and basically it follows the
style. (IIUC pg_fatal("Your installation...") is followed the
"Detail and hint messages" rule.)

======
src/bin/pg_upgrade/info.c

7. get_db_rel_and_slot_infos

void
get_db_rel_and_slot_infos(ClusterInfo *cluster)
{
int dbnum;

if (cluster->dbarr.dbs != NULL)
free_db_and_rel_infos(&cluster->dbarr);

~

Judging from the HEAD code this function was intended to be reentrant
-- e.g. it does cleanup code free_db_and_rel_infos in case there was
something there from before.

IIUC there is no such cleanup for the slot_arr. I forget why this was
removed. Sure, you might be able to survive the memory leaks, but
choosing NOT to clean up the slot_arr seems to contradict the
intention of HEAD calling free_db_and_rel_infos.

free_db_and_rel_infos() is called if get_db_rel_and_slot_infos() is called
several times for the same cluster. Followings are callers:

* check_and_dump_old_cluster(), target is old_cluster
* check_new_cluster(), target is new_cluster
* create_new_objects(), target is new_cluster

And we requires that new_cluster must not have logical slots, this restriction
cannot ease. Therefore, there are no possibilities slot_arr must be free()'d,
so that I removed (See similar discussion [3]/messages/by-id/TYAPR01MB5866732D30ABB976992BDECCF5789@TYAPR01MB5866.jpnprd01.prod.outlook.com). I think we should not add no-op codes.
In old version there was an Assert() instead, but removed based on the comment [4]/messages/by-id/OS0PR01MB5716670FE547BA87FDEF895E94EDA@OS0PR01MB5716.jpnprd01.prod.outlook.com.

8. get_db_infos

I noticed the pg_malloc0 is reverted in this function.
- dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+ dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
IMO it is better to do pg_malloc0 here.

Sure, everything probably works OK for the current code,

Yes, it works well. No one checks slot_arr before
get_old_cluster_logical_slot_infos(). In the old version, it was checked like
(slot_arr == NULL) infree_db_and_rel_infos(), but removed.

but it seems
unnecessarily risky to assume that functions will forever be called in
a specific order. AFAICT if someone (e.g. for debugging) calls
count_old_cluster_logical_slots() or calls print_slot_infos() then the
behaviour is undefined because slot_arr.nslots remains uninitialized.

Hmm, I do not think such assumption is needed. In the current code pg_malloc() is
used in get_db_infos(), so there is a possibility that print_rel_infos() is
executed for debugging. The behavior is undefined - this is same as you said,
and code has been alive. Based on that I think we can accept the risk and
reduce operations instead. If you knew other example, please share here...

[1]: /messages/by-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION
[3]: /messages/by-id/TYAPR01MB5866732D30ABB976992BDECCF5789@TYAPR01MB5866.jpnprd01.prod.outlook.com
[4]: /messages/by-id/OS0PR01MB5716670FE547BA87FDEF895E94EDA@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#225

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Michael Paquier (#223)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Michael,

On Tue, Sep 12, 2023 at 02:33:25AM +0000, Zhijie Hou (Fujitsu) wrote:
2.
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated
during the upgrade.");

I think normally the first letter is lowercase, and we can avoid the period.

Documentation is your friend:
https://www.postgresql.org/docs/current/error-style-guide.html

Thank you for the information! It is quite helpful for me.
(Some fatal errors started with capital character like "Your installation contains...",
but I regarded them as the detail or hint message.)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#226

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#220)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version.

src/backend/replication/slot.c

3. InvalidatePossiblyObsoleteSlot
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
3a.
That comment didn't seem good. I think you mean like in the suggestion below.

SUGGESTION
It should not be possible for logical replication slots to be
invalidated because max_slot_wal_keep_size is set to -1 during the
upgrade. The following is just for sanity-checking.

This part was updated in v35. Please tell me if current version is still bad...

3b.
I wasn't sure if 'max_slot_wal_keep_size' GUC is accessible in this
scope, but if it is available then maybe
Assert(max_slot_wal_keep_size_mb == -1); should also be included in
this sanity check.

IIUC, guc parameters are visible from all the postgres processes.
Added.

src/bin/pg_upgrade/check.c

4. check_new_cluster_logical_replication_slots
+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");
There is some inconsistency with all the subsequent pg_fatals within
this function -- some of them mention "New cluster" but most of them
do not.

Meanwhile, Kuroda-san showed me sample output like:

Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok
Checking for logical replication slots
New cluster must not have logical replication slots but found 1 slot.
Failure, exiting

So, I felt the log message title ("Checking...") should be changed to
include the words "new cluster" just like the log preceding it:

"Checking for logical replication slots" ==> "Checking for new cluster
logical replication slots"

Now all the subsequent pg_fatals clearly are for "new cluster"

Changed.

5. check_new_cluster_logical_replication_slots
+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+   "New cluster must not have logical replication slots but found %d slots.",
+   nslots_on_new),
+ nslots_on_new);
5a.
TBH, I didn't see why you go to unnecessary trouble to have a plural
message here. The message could just be like:
"New cluster must have 0 logical replication slots but found %d."

~

5b.
However, now (from the previous review comment #4) if "New cluster" is
already explicit in the log, the pg_fatal message can become just:
"New cluster must have ..." ==> "Expected 0 logical replication slots
but found %d."

Basically it's better. But the initial character should be lower case and period
is not needed. Modified like that.

9. get_old_cluster_logical_slot_infos
+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+ i_caught_up = PQfnumber(res, "caught_up");
+ i_invalid = PQfnumber(res, "conflicting");
IMO SQL should be using an alias for this column, so you can say:
i_invalid = PQfnumber(res, "invalid")

which seems better than switching the wording in code.

Modified. The argument of PQfnumber() must be same as the column name, so the
word "as invalid" was added to SQL.

src/bin/pg_upgrade/pg_upgrade.h

10. LogicalSlotInfo
+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* can the slot decode 2PC? */
+ bool caught_up; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */
+ bool invalid; /* Is the slot usable? */
+} LogicalSlotInfo;
~

+ bool invalid; /* Is the slot usable? */
This field name and comment have opposite meanings. Invalid means NOT usable.

SUGGESTION
/* If true, the slot is unusable. */

Fixed.

src/bin/pg_upgrade/server.c

11. start_postmaster
* we only modify the new cluster, so only use it there.  If there is a
* crash, the new cluster has to be recreated anyway.  fsync=off is a big
* win on ext4.
+ *
+ * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
+ * required by logical slots. The setting could avoid the invalidation of
+ * slots during the upgrade.
*/
~
IMO this comment "to prevent the WAL removal required by logical
slots" is ambiguous about how it could be interpreted. Needs
rearranging for clarity.

The description was changed. How do you think?

12. start_postmaster
(cluster == &new_cluster) ?
- " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+ " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c
max_slot_wal_keep_size=-1 " :
+ " -c max_slot_wal_keep_size=-1",
Instead of putting the same option on both sides of the ternary, I was
wondering if it might be better to hardwire the max_slot_wal_keep_size
just 1 time in the format string?

Fixed.

.../pg_upgrade/t/003_logical_replication_slots.pl

13.
# Remove the remained slot

/remained/remaining/

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v36-0001-Flush-logical-slots-to-disk-during-a-shutdown-ch.patchapplication/octet-stream; name=v36-0001-Flush-logical-slots-to-disk-during-a-shutdown-ch.patchDownload

From df52f15709a3c03d09d7db6e53aaaad01aba65ef Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v36 1/2] Flush logical slots to disk during a shutdown
 checkpoint if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly flushed to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart.  Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives.  As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.

The approach taken by this patch has been suggested by Ashutosh Bapat.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Ashutosh Bapat, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/replication/slot.c                |  32 +++++-
 src/include/replication/slot.h                |   8 +-
 src/test/recovery/meson.build                 |   1 +
 .../t/038_save_logical_slots_shutdown.pl      | 102 ++++++++++++++++++
 5 files changed, 140 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..a31c7867cf 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1573,10 +1574,12 @@ restart:
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * location.  Additionally, in case of a shutdown checkpoint, we also identify
+ * the slots for which the confirmed_flush LSN has been updated since the last
+ * time it was saved and flush them.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1604,27 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * Slot's data is not flushed each time the confirmed_flush LSN is
+		 * updated as that could lead to frequent writes.  However, we decide
+		 * to force a flush of all logical slot's data at the time of shutdown
+		 * if the confirmed_flush LSN is changed since we last flushed it to
+		 * disk.  This helps in avoiding an unnecessary retreat of the
+		 * confirmed_flush LSN after restart.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(&s->mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+				s->data.confirmed_flush > s->last_saved_confirmed_flush)
+			{
+				s->just_dirtied = true;
+				s->dirty = true;
+			}
+			SpinLockRelease(&s->mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1897,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	SpinLockAcquire(&slot->mutex);
 	if (!slot->just_dirtied)
 		slot->dirty = false;
+	slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 	SpinLockRelease(&slot->mutex);
 
 	LWLockRelease(&slot->io_in_progress_lock);
@@ -2074,6 +2099,7 @@ RestoreSlotFromDisk(const char *name)
 		/* initialize in memory state */
 		slot->effective_xmin = cp.slotdata.xmin;
 		slot->effective_catalog_xmin = cp.slotdata.catalog_xmin;
+		slot->last_saved_confirmed_flush = cp.slotdata.confirmed_flush;
 
 		slot->candidate_catalog_xmin = InvalidTransactionId;
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8a89dc784..5e60030234 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -178,6 +178,12 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * LSN used to track the last confirmed_flush LSN where the slot's data
+	 * has been flushed to disk.
+	 */
+	XLogRecPtr	last_saved_confirmed_flush;
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -241,7 +247,7 @@ extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslo
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
 extern void StartupReplicationSlots(void);
-extern void CheckPointReplicationSlots(void);
+extern void CheckPointReplicationSlots(bool is_shutdown);
 
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index e7328e4894..646d6ffde4 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -43,6 +43,7 @@ tests += {
       't/035_standby_logical_decoding.pl',
       't/036_truncated_dropped.pl',
       't/037_invalid_database.pl',
+      't/038_save_logical_slots_shutdown.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
new file mode 100644
index 0000000000..de19829560
--- /dev/null
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always flushed to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub compare_confirmed_flush
+{
+	my ($node, $confirmed_flush_from_log) = @_;
+
+	# Fetch Latest checkpoint location from the control file
+	my ($stdout, $stderr) =
+	  run_command([ 'pg_controldata', $node->data_dir ]);
+	my @control_data = split("\n", $stdout);
+	my $latest_checkpoint = undef;
+	foreach (@control_data)
+	{
+		if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+		{
+			$latest_checkpoint = $1;
+			last;
+		}
+	}
+	die "Latest checkpoint location not found in control file\n"
+	  unless defined($latest_checkpoint);
+
+	# Is it same as the value read from log?
+	ok( $latest_checkpoint eq $confirmed_flush_from_log,
+		"Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location"
+	);
+
+	return;
+}
+
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('pub');
+$node_publisher->init(allows_streaming => 'logical');
+# Avoid checkpoint during the test, otherwise, the latest checkpoint location
+# will change.
+$node_publisher->append_conf(
+	'postgresql.conf', q{
+checkpoint_timeout = 1h
+autovacuum = off
+});
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('sub');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create tables
+$node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Insert some data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub"
+);
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'sub');
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl");
+
+is($result, qq(5), "check initial copy was done");
+
+my $offset = -s $node_publisher->logfile;
+
+# Restart the publisher to ensure that the slot will be flushed if required
+$node_publisher->restart();
+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+	qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+	$offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";
+
+# Ensure that the slot's confirmed_flush LSN is the same as the
+# latest_checkpoint location.
+compare_confirmed_flush($node_publisher, $1);
+
+done_testing();
-- 
2.27.0

v36-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v36-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 75afb9e9aaf6123e3f85c2f7674f1b51b11fce97 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v36 2/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  79 ++++++-
 src/backend/replication/slot.c                |  12 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  28 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |  13 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 834 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..64242a2d82 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,80 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location. This ensures that all the data has been replicated
+       before the upgrade.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +726,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are copied to the
+       new standby, and other other slots on the old standby must be recreated
+       as they are not copied.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a31c7867cf..860a36a305 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b1424fdf9c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,164 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..b414ff2156 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,20 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..f7b0deca87 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..f5ce6c3b4d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..18dff6829d 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,21 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  The setting ensures that such WAL
+	 * records have remained so that invalidation of slots would be avoided
+	 * during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..01cb04ca12
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f2af84d7ca..98c01fa05f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#227

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#221)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing!

=====

Commit message

1.
Note that the pg_resetwal command would remove WAL files, which are required
as
restart_lsn. If WALs required by logical replication slots are removed, they are
unusable. Therefore, during the upgrade, slot restoration is done
after the final
pg_resetwal command. The workflow ensures that required WALs are remained.

~

SUGGESTION (minor wording and /required as/required for/ and
/remained/retained/)
Note that the pg_resetwal command would remove WAL files, which are
required for restart_lsn. If WALs required by logical replication
slots are removed, the slots are unusable. Therefore, during the
upgrade, slot restoration is done after the final pg_resetwal command.
The workflow ensures that required WALs are retained.

Fixed.

doc/src/sgml/ref/pgupgrade.sgml

2.
The SGML is mal-formed so I am unable to build PG DOCS. Please try
building the docs before posting the patch.

ref/pgupgrade.sgml:446: parser error : Opening and ending tag
mismatch: itemizedlist line 410 and listitem
</listitem>
^

Fixed. Sorry for noise.

3.
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there are no slots whose
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>t
emporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>

/there are no slots whose/there must be no slots where/

Fixed.

4.
or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are migrated to the
+       new standby, and other slots on the old standby must be recreated as
+       they are not copied.
</para>
Mixing the terms "migrated" and "copied" seems to complicate this.
Does the following suggestion work better instead?

SUGGESTION (??)
Only logical slots on the primary are migrated to the new standby. Any
other slots present on the old standby must be recreated.

Hmm, I preferred to use "copied". How do you think?

src/backend/replication/slot.c

5. InvalidatePossiblyObsoleteSlot
+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size is set to -1 during the upgrade.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
+
I felt the comment could have another sentence like "The following is
just a sanity check."

Added.

src/bin/pg_upgrade/function.c

6. get_loadable_libraries
+ array_size = totaltups + count_old_cluster_logical_slots();
+ os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
(array_size));
totaltups = 0;
6a.
Maybe something like 'n_libinfos' would be a more meaningful name than
'array_size'?

Fixed.

6b.
+ os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
(array_size));

Those extra parentheses around "(array_size)" seem overkill.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#228

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#222)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thank you for reviewing!

1.

#include "access/transam.h"
#include "catalog/pg_language_d.h"
+#include "fe_utils/string_utils.h"
#include "pg_upgrade.h"

It seems we don't need this head file anymore.

Removed.

2.
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+			elog(ERROR, "Replication slots must not be invalidated
during the upgrade.");
I think normally the first letter is lowercase, and we can avoid the period.

Right, fixed. Also, a period is removed based on the rule. Apart from other detailed
messages, this just reports what happened.

```
        if (nslots_on_old > max_replication_slots)
                pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
-                                "logical replication slots (%d) on the old cluster.",
+                                "logical replication slots (%d) on the old cluster",
                                 max_replication_slots, nslots_on_old);
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#229

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#226)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Hi Kuroda-san. Here are my review comments for patch v36-0002.

======
doc/src/sgml/ref/pgupgrade.sgml

1.
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and
<function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are copied to the
+       new standby, and other other slots on the old standby must be recreated
+       as they are not copied.
       </para>

IMO this text still needs some minor changes like shown below, Anyway,
there is a typo: /other other/

SUGGESTION
Only logical slots on the primary are copied to the new standby, but
other slots on the old standby are not copied so must be recreated
manually.

======
src/bin/pg_upgrade/server.c

2.
+ *
+ * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+ * checkpointer process.  If WALs required by logical replication slots are
+ * removed, the slots are unusable.  The setting ensures that such WAL
+ * records have remained so that invalidation of slots would be avoided
+ * during the upgrade.

The comment already explained the reason for the setting is to prevent
removing the needed WAL records, so I felt there is no need for the
last sentence to repeat the same information.

BEFORE
The setting ensures that such WAL records have remained so that
invalidation of slots would be avoided during the upgrade.

SUGGESTION
This setting prevents the invalidation of slots during the upgrade.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#230

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#228)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Sep 12, 2023 at 5:20 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Few comments:
=============
1. One thing to note is that if user checks whether the old cluster is
upgradable with --check option and then try to upgrade, that will also
fail. Because during the --check run there would at least one
additional shutdown checkpoint WAL and then in the next run the slots
position won't match. Note, I am saying this in context of using
--check option with not-running old cluster. Won't that be surprising
to users? One possibility is that we document such a behaviour and
other is that we go back to WAL reading design where we can ignore
known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

2.
+ /*
+ * Store the names of output plugins as well. There is a possibility
+ * that duplicated plugins are set, but the consumer function
+ * check_loadable_libraries() will avoid checking the same library, so
+ * we do not have to consider their uniqueness here.
+ */
+ for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+ {
+ os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);

Here, we should ignore invalid slots.

3.
+ if (!live_check && !slot->caught_up)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "The slot \"%s\" has not consumed the WAL yet\n",
+ slot->slotname);

Is it possible to print the LSN locations of slot and last checkpoint?
I think that will aid in debugging the problems if any and could be
helpful to users as well.

--
With Regards,
Amit Kapila.

#231

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#230)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! Before making a patch I can reply the important point.

1. One thing to note is that if user checks whether the old cluster is
upgradable with --check option and then try to upgrade, that will also
fail. Because during the --check run there would at least one
additional shutdown checkpoint WAL and then in the next run the slots
position won't match. Note, I am saying this in context of using
--check option with not-running old cluster. Won't that be surprising
to users? One possibility is that we document such a behaviour and
other is that we go back to WAL reading design where we can ignore
known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Here are other approaches we came up with:

1. adds WARNING message when the --check is executed and slots are checked.
We can say like:

```
...
Checking for valid logical replication slots
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...
```

2. adds hint message in the FATAL error when the confirmed_flush is not same as
the latest checkpoint:

```
...
Checking for valid logical replication slots fatal

Your installation contains invalid logical replication slots.
These slots can't be copied, so this cluster cannot be upgraded.
Consider removing such slots or consuming the pending WAL if any,
and then restart the upgrade.
If you did pg_upgrade --check before this run, it may be a cause.
Please start clusters and confirm again that all changes are
replicated.
A list of invalid logical replication slots is in the file:
```

3. requests users to do pg_upgrade --check on backup database, if old cluster
has logical slots. Basically they save a whole of cluster before doing pg_uprade,
so it may be acceptable. This is not a modification of codes.

How do others think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#232

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#231)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Wednesday, September 13, 2023 9:52 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! Before making a patch I can reply the important point.

1. One thing to note is that if user checks whether the old cluster is
upgradable with --check option and then try to upgrade, that will also
fail. Because during the --check run there would at least one
additional shutdown checkpoint WAL and then in the next run the slots
position won't match. Note, I am saying this in context of using
--check option with not-running old cluster. Won't that be surprising
to users? One possibility is that we document such a behaviour and
other is that we go back to WAL reading design where we can ignore
known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that
confirmed_flush is
behind the latest checkpoint lsn.

Here are other approaches we came up with:

1. adds WARNING message when the --check is executed and slots are
checked.
We can say like:

```
...
Checking for valid logical replication slots
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...
```

2. adds hint message in the FATAL error when the confirmed_flush is not same
as
the latest checkpoint:

```
...
Checking for valid logical replication slots fatal

Your installation contains invalid logical replication slots.
These slots can't be copied, so this cluster cannot be upgraded.
Consider removing such slots or consuming the pending WAL if any,
and then restart the upgrade.
If you did pg_upgrade --check before this run, it may be a cause.
Please start clusters and confirm again that all changes are
replicated.
A list of invalid logical replication slots is in the file:
```

3. requests users to do pg_upgrade --check on backup database, if old cluster
has logical slots. Basically they save a whole of cluster before doing
pg_uprade,
so it may be acceptable. This is not a modification of codes.

Here are some more ideas about the issue for reference.

1) Extending the controlfile.

We can dd a new field (e.g. non_upgrade_checkPoint) to record the last check point
ptr happened in non-upgrade mode. The new field won't be updated due to
"pg_upgrade --check", so pg_upgrade can use this LSN to compare with the slot's
confirmed_flush_lsn.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Not sure if this is a enough reason to introduce new field in
controlfile.

-----------

2) Advance the slot's confirmed_flush_lsn in pg_upgrade if the check passes.

Introducing an upgrade support SQL function
(binary_upgrade_advance_logical_slot_lsn()) to set a
flag(catch_confirmed_lsn_up) on server side. On server side, when trying to
flush the slot in shutdown checkpoint(CheckPointReplicationSlots), we update
the slot's confirmed_flush_lsn to the lsn of the current checkpoint if
catch_confirmed_lsn_up is set.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Although we have some examples for using functions
(binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade
, but not sure if it's a standard behavior to change the slot's lsn during
upgrade.

-----------

3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
already did the upgrade check for stopped server, they can use this option
when trying to upgrade later.

Pros: Can save some efforts for user to advance each slot's lsn.

Cons: I didn't see similar options in pg_upgrade, might need some agreement.

Best Regards,
Hou zj

#233

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#231)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 13, 2023 at 7:22 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! Before making a patch I can reply the important point.

1. One thing to note is that if user checks whether the old cluster is
upgradable with --check option and then try to upgrade, that will also
fail. Because during the --check run there would at least one
additional shutdown checkpoint WAL and then in the next run the slots
position won't match. Note, I am saying this in context of using
--check option with not-running old cluster. Won't that be surprising
to users? One possibility is that we document such a behaviour and
other is that we go back to WAL reading design where we can ignore
known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Good catch.

Here are other approaches we came up with:

1. adds WARNING message when the --check is executed and slots are checked.
We can say like:

```
...
Checking for valid logical replication slots
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...

IMHO the --check is a very common command users execute before the
actual upgrade. So issuing such a WARNING might not be good because
then what option user have? Do they need to again restart the cluster
in order to stream the new WAL and again shut it down? I don't think
that is really an acceptable idea. Maybe as discussed in the past we
can provide an option to skip the slot checking and during the --check
command we can give a WARNING and suggest that better to use
--skip-slot-checking for the main upgrade as we have already checked.
This could still be okay for the user.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#234

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#232)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 14, 2023 at 8:40 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here are some more ideas about the issue for reference.

1) Extending the controlfile.

We can dd a new field (e.g. non_upgrade_checkPoint) to record the last check point
ptr happened in non-upgrade mode. The new field won't be updated due to
"pg_upgrade --check", so pg_upgrade can use this LSN to compare with the slot's
confirmed_flush_lsn.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Not sure if this is a enough reason to introduce new field in
controlfile.

Yeah, this could be an option but I am not sure either that adding a
new option for this purpose is the best way.

-----------

2) Advance the slot's confirmed_flush_lsn in pg_upgrade if the check passes.

Introducing an upgrade support SQL function
(binary_upgrade_advance_logical_slot_lsn()) to set a
flag(catch_confirmed_lsn_up) on server side. On server side, when trying to
flush the slot in shutdown checkpoint(CheckPointReplicationSlots), we update
the slot's confirmed_flush_lsn to the lsn of the current checkpoint if
catch_confirmed_lsn_up is set.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Although we have some examples for using functions
(binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade
, but not sure if it's a standard behavior to change the slot's lsn during
upgrade.

I feel this seems like a good option.

-----------

3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
already did the upgrade check for stopped server, they can use this option
when trying to upgrade later.

Pros: Can save some efforts for user to advance each slot's lsn.

Cons: I didn't see similar options in pg_upgrade, might need some agreement.

Yeah right, in fact during the --check command we can give that
suggestion as well.

I feel option 2 looks best to me unless there is some design issue to
that, as of now I do not see any issue with that though. Let's see
what others think.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#235

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#234)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 14, 2023 at 9:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Sep 14, 2023 at 8:40 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here are some more ideas about the issue for reference.

1) Extending the controlfile.

We can dd a new field (e.g. non_upgrade_checkPoint) to record the last check point
ptr happened in non-upgrade mode. The new field won't be updated due to
"pg_upgrade --check", so pg_upgrade can use this LSN to compare with the slot's
confirmed_flush_lsn.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Not sure if this is a enough reason to introduce new field in
controlfile.

Yeah, this could be an option but I am not sure either that adding a
new option for this purpose is the best way.

I also think so. It seems this could work but adding upgrade-specific
information to other data structures doesn't sound like a clean
solution.

-----------

2) Advance the slot's confirmed_flush_lsn in pg_upgrade if the check passes.

Introducing an upgrade support SQL function
(binary_upgrade_advance_logical_slot_lsn()) to set a
flag(catch_confirmed_lsn_up) on server side. On server side, when trying to
flush the slot in shutdown checkpoint(CheckPointReplicationSlots), we update
the slot's confirmed_flush_lsn to the lsn of the current checkpoint if
catch_confirmed_lsn_up is set.

Pros: User can smoothly upgrade the cluster even if they run "pg_upgrade
--check" in advance.

Cons: Although we have some examples for using functions
(binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade
, but not sure if it's a standard behavior to change the slot's lsn during
upgrade.

I feel this seems like a good option.

In this idea, if the user decides not to proceed after the upgrade
--check, then we would have incremented the confirmed_flush location
of all slots without the subscriber's acknowledgment. It may not be
the usual scenario but in theory, it may violate our basic principle
of incrementing confirmed_flush location. Another thing to consider is
we have to do this for all logical slots under the assumption that all
are already caught up as pg_upgrade would have ensured that. So,
ideally, the server should have some knowledge that the slots are
already caught up to the latest location which again doesn't seem like
a clean idea.

-----------

3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
already did the upgrade check for stopped server, they can use this option
when trying to upgrade later.

Pros: Can save some efforts for user to advance each slot's lsn.

Cons: I didn't see similar options in pg_upgrade, might need some agreement.

Yeah right, in fact during the --check command we can give that
suggestion as well.

Hmm, we can't mandate users to skip checking slots because that is the
whole point of --check slots.

I feel option 2 looks best to me unless there is some design issue to
that, as of now I do not see any issue with that though. Let's see
what others think.

By the way, did you consider the previous approach this patch was
using? Basically, instead of getting the last checkpoint location from
the control file, we will read the WAL file starting from the
confirmed_flush location of a slot and if we find any WAL other than
expected WALs like shutdown checkpoint, running_xacts, etc. then we
will error out.

--
With Regards,
Amit Kapila.

#236

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#235)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 14, 2023 at 10:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 14, 2023 at 9:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Cons: Although we have some examples for using functions
(binary_upgrade_set_next_pg_enum_oid ...) to set some variables during upgrade
, but not sure if it's a standard behavior to change the slot's lsn during
upgrade.

I feel this seems like a good option.

In this idea, if the user decides not to proceed after the upgrade
--check, then we would have incremented the confirmed_flush location
of all slots without the subscriber's acknowledgment.

Yeah, thats a problem.

-----------

3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
already did the upgrade check for stopped server, they can use this option
when trying to upgrade later.

Pros: Can save some efforts for user to advance each slot's lsn.

Cons: I didn't see similar options in pg_upgrade, might need some agreement.

Yeah right, in fact during the --check command we can give that
suggestion as well.

Hmm, we can't mandate users to skip checking slots because that is the
whole point of --check slots.

I mean not to mandate skipping in the --check command. But once the
check command has already checked the slot then we can issue a
suggestion to the user that the slots are already checked so that
during the actual upgrade we can --skip checking the slots. So for
user who has already run the check command and is now following with
an upgrade can skip slot checking if we can provide such an option.

I feel option 2 looks best to me unless there is some design issue to
that, as of now I do not see any issue with that though. Let's see
what others think.

By the way, did you consider the previous approach this patch was
using? Basically, instead of getting the last checkpoint location from
the control file, we will read the WAL file starting from the
confirmed_flush location of a slot and if we find any WAL other than
expected WALs like shutdown checkpoint, running_xacts, etc. then we
will error out.

So basically, while scanning from confirmed_flush we must ensure that
we find a first record as SHUTDOWN CHECKPOINT record at the same LSN,
and after that, we should not get any other WAL other than like you
said shutdown checkpoint, running_xacts. That way we will ensure both
aspect that the confirmed flush LSN is at the shutdown checkpoint and
after that there is no real activity in the system. I think to me,
this seems like the best available option so far.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#237

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#236)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 14, 2023 at 10:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Sep 14, 2023 at 10:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 14, 2023 at 9:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

-----------

3) Introduce a new pg_upgrade option(e.g. skip_slot_check), and suggest if user
already did the upgrade check for stopped server, they can use this option
when trying to upgrade later.

Pros: Can save some efforts for user to advance each slot's lsn.

Cons: I didn't see similar options in pg_upgrade, might need some agreement.

Yeah right, in fact during the --check command we can give that
suggestion as well.

Hmm, we can't mandate users to skip checking slots because that is the
whole point of --check slots.

I mean not to mandate skipping in the --check command. But once the
check command has already checked the slot then we can issue a
suggestion to the user that the slots are already checked so that
during the actual upgrade we can --skip checking the slots. So for
user who has already run the check command and is now following with
an upgrade can skip slot checking if we can provide such an option.

oh, okay, we can document and request the user to follow as you
suggest but I guess it will be more work for the user and also is less
intuitive.

I feel option 2 looks best to me unless there is some design issue to
that, as of now I do not see any issue with that though. Let's see
what others think.

By the way, did you consider the previous approach this patch was
using? Basically, instead of getting the last checkpoint location from
the control file, we will read the WAL file starting from the
confirmed_flush location of a slot and if we find any WAL other than
expected WALs like shutdown checkpoint, running_xacts, etc. then we
will error out.

So basically, while scanning from confirmed_flush we must ensure that
we find a first record as SHUTDOWN CHECKPOINT record at the same LSN,
and after that, we should not get any other WAL other than like you
said shutdown checkpoint, running_xacts. That way we will ensure both
aspect that the confirmed flush LSN is at the shutdown checkpoint and
after that there is no real activity in the system.

Right.

I think to me,
this seems like the best available option so far.

Yeah, let's see if someone else has a different opinion or has a better idea.

--
With Regards,
Amit Kapila.

#238

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#237)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

So basically, while scanning from confirmed_flush we must ensure that
we find a first record as SHUTDOWN CHECKPOINT record at the same LSN,
and after that, we should not get any other WAL other than like you
said shutdown checkpoint, running_xacts. That way we will ensure both
aspect that the confirmed flush LSN is at the shutdown checkpoint and
after that there is no real activity in the system.

Right.

I think to me,
this seems like the best available option so far.

Yeah, let's see if someone else has a different opinion or has a better idea.

Based on the recent discussion, I made a prototype which reads all WAL records
and verifies their type. A new upgrade function binary_upgrade_validate_wal_record_types_after_lsn()
does that. This function reads WALs from start_lsn (confirmed_flush), and returns
true if they can ignore. The type of ignored records are listed in [1]/messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Kindly Hou found that XLOG_HEAP2_PRUNE may be generated during the pg_upgrade
--check, so it was added to acceptable type.

[1]: /messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v37-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v37-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From f0e23055180ebb76ee4fd6a5e7c7613f6ceb3c67 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v37 1/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  79 ++++++-
 src/backend/replication/slot.c                |  12 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |  12 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 836 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..4e2281bae4 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,80 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location. This ensures that all the data has been replicated
+       before the upgrade.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +726,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b1424fdf9c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,164 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..f7b0deca87 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..f5ce6c3b4d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..20589e8c43 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,20 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  This setting prevents the invalidation
+	 * of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..01cb04ca12
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f3d8a2a855..8e5ff87dff 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v37-0002-Reads-all-WAL-records-ahead-confirmed_flush_lsn.patchapplication/octet-stream; name=v37-0002-Reads-all-WAL-records-ahead-confirmed_flush_lsn.patchDownload

From 8a8a0c7da5d4c36a71ff868ba5531c1e9a1a0b27 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Sep 2023 06:01:40 +0000
Subject: [PATCH v37 2/2] Reads all WAL records ahead confirmed_flush_lsn

---
 contrib/pg_walinspect/pg_walinspect.c         | 94 -------------------
 doc/src/sgml/ref/pgupgrade.sgml               |  5 +-
 src/backend/access/transam/xlogutils.c        | 92 ++++++++++++++++++
 src/backend/utils/adt/pg_upgrade_support.c    | 87 +++++++++++++++++
 src/bin/pg_upgrade/check.c                    |  8 +-
 src/bin/pg_upgrade/controldata.c              | 39 --------
 src/bin/pg_upgrade/info.c                     |  6 +-
 src/bin/pg_upgrade/pg_upgrade.h               |  1 -
 .../t/003_logical_replication_slots.pl        | 20 ++++
 src/include/access/xlogutils.h                |  3 +
 src/include/catalog/pg_proc.dat               |  6 ++
 11 files changed, 216 insertions(+), 145 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
-       of all slots on the old cluster must be the same as the latest
-       checkpoint location. This ensures that all the data has been replicated
-       before the upgrade.
+       Old cluster has replicated all the changes replicated to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..02078920f3 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +37,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,79 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Get WAL records from start LSN and check their type.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * Currently following types of WAL records are ignored:
+ *		- XLOG_CHECKPOINT_SHUTDOWN
+ *		- XLOG_CHECKPOINT_ONLINE
+ *		- XLOG_RUNNING_XACTS
+ *		- XLOG_FPI_FOR_HINT
+ *		- XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogRecPtr	  curr_lsn = GetFlushRecPtr(NULL);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			result = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= curr_lsn)
+		PG_RETURN_BOOL(true);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Read records till end of WAL */
+	while (result && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+
+		/*
+		 * XXX: check the type of WAL. Currently XLOG info is directly
+		 * extracted, but it may be better to use the descriptor instead.
+		 */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				result = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			result = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(result);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b1424fdf9c..df1ce67fc0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1480,8 +1480,8 @@ check_new_cluster_logical_replication_slots(void)
  * Following points are checked:
  *
  *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except a
- *	  CHECKPOINT_SHUTDOWN record.
+ *	- All logical replication slots consumed all WALs, except some acceptable
+ *	  types.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1521,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that confirmed_flush LSN of all
-			 * the slots is the same as the latest checkpoint location.
+			 * Do additional checks to ensure that all logical replication
+			 * slots have reached the current WAL position.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index f8f823e2be..4beb65ab22 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,45 +169,6 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
-
-			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
-			{
-				/*
-				 * Read the latest checkpoint location if the cluster is PG17
-				 * or later. This is used for upgrading logical replication
-				 * slots. Currently, we need it only for the old cluster but
-				 * for simplicity chose not to have additional checks.
-				 */
-				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-				{
-					char	   *slash = NULL;
-					uint32		upper_lsn,
-								lower_lsn;
-
-					p = strchr(p, ':');
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					p++;		/* remove ':' char */
-
-					p = strpbrk(p, "01234567890ABCDEF");
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					/*
-					 * The upper and lower part of LSN must be read separately
-					 * because it is stored as in %X/%X format.
-					 */
-					upper_lsn = strtoul(p, &slash, 16);
-					lower_lsn = strtoul(++slash, NULL, 16);
-
-					/* And combine them */
-					cluster->controldata.chkpnt_latest =
-						((uint64) upper_lsn << 32) | lower_lsn;
-				}
-			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f7b0deca87..5d25d1604e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -647,12 +647,12 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
 	 * removed.
 	 */
 	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
-							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"pg_catalog.binary_upgrade_validate_wal_record_types_after_lsn(confirmed_flush_lsn) as caught_up, "
+							"conflicting as invalid "
 							"FROM pg_catalog.pg_replication_slots "
 							"WHERE slot_type = 'logical' AND "
 							"database = current_database() AND "
-							"temporary IS FALSE;",
-							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+							"temporary IS FALSE;");
 
 	num_slots = PQntuples(res);
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f5ce6c3b4d..8a7f56831e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -246,7 +246,6 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
-	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 01cb04ca12..b91fb2f88f 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -169,6 +169,26 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
+# Dry run, successful check is expected. This is not live check, so shutdown
+# checkpoint record would be inserted. We want to test that
+# binary_upgrade_validate_wal_record_types_after_lsn() skips the WAL and then
+# upcoming pg_upgrade would succeed.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f3d843222b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_record_types_after_lsn',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_record_types_after_lsn' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

#239

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#230)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Again, thank you for reviewing! New patch is available in [1]/messages/by-id/TYAPR01MB5866D63A6460059DC661BF62F5F6A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

2.
+ /*
+ * Store the names of output plugins as well. There is a possibility
+ * that duplicated plugins are set, but the consumer function
+ * check_loadable_libraries() will avoid checking the same library, so
+ * we do not have to consider their uniqueness here.
+ */
+ for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+ {
+ os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);

Here, we should ignore invalid slots.

"continue" was added.

3.
+ if (!live_check && !slot->caught_up)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "The slot \"%s\" has not consumed the WAL yet\n",
+ slot->slotname);
Is it possible to print the LSN locations of slot and last checkpoint?
I think that will aid in debugging the problems if any and could be
helpful to users as well.

Based on recent discussion, I'm not sure we should output the actual LSN here.
(We do not check latect checkpoint anymore)
If you still think it should be, please tell me again.

[1]: /messages/by-id/TYAPR01MB5866D63A6460059DC661BF62F5F6A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#240

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Peter Smith (#229)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! New patch is available in [1]/messages/by-id/TYAPR01MB5866D63A6460059DC661BF62F5F6A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

1.
Configure the servers for log shipping.  (You do not need to run
<function>pg_backup_start()</function> and
<function>pg_backup_stop()</function>
or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.) Only logical slots on the primary are copied to the
+       new standby, and other other slots on the old standby must be recreated
+       as they are not copied.
</para>
IMO this text still needs some minor changes like shown below, Anyway,
there is a typo: /other other/

SUGGESTION
Only logical slots on the primary are copied to the new standby, but
other slots on the old standby are not copied so must be recreated
manually.

Fixed.

======
src/bin/pg_upgrade/server.c
2.
+ *
+ * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+ * checkpointer process.  If WALs required by logical replication slots are
+ * removed, the slots are unusable.  The setting ensures that such WAL
+ * records have remained so that invalidation of slots would be avoided
+ * during the upgrade.
The comment already explained the reason for the setting is to prevent
removing the needed WAL records, so I felt there is no need for the
last sentence to repeat the same information.

BEFORE
The setting ensures that such WAL records have remained so that
invalidation of slots would be avoided during the upgrade.

SUGGESTION
This setting prevents the invalidation of slots during the upgrade.

Fixed.

[1]: /messages/by-id/TYAPR01MB5866D63A6460059DC661BF62F5F6A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#241

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#238)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 15, 2023 at 8:43 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Few comments:
1. Why is the FPI record (XLOG_FPI_FOR_HINT) not considered a record
to be ignored? This can be generated during reading system tables.

2.
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
{
...
+ if (initial_record)
+ {
+ /* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+   XLOG_CHECKPOINT_SHUTDOWN))
+ result = false;
...
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ result = false;
...
}

Isn't it better to immediately return false if any unexpected WAL is
found? This will avoid reading unnecessary WAL

3.
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
...
+
+ CHECK_IS_BINARY_UPGRADE;
+
+ /* Quick exit if the given lsn is larger than current one */
+ if (start_lsn >= curr_lsn)
+ PG_RETURN_BOOL(true);

Why do you return true here? My understanding was if the first record
is not a shutdown checkpoint record then it should fail, if that is
not true then I think we need to explain the same in comments.

--
With Regards,
Amit Kapila.

#242

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#241)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! PSA new version patch set.

Few comments:
1. Why is the FPI record (XLOG_FPI_FOR_HINT) not considered a record
to be ignored? This can be generated during reading system tables.

Oh, I just missed. Written in comments atop the function, but not added here.
Added to white-list.

2.
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
{
...
+ if (initial_record)
+ {
+ /* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+   XLOG_CHECKPOINT_SHUTDOWN))
+ result = false;
...
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ result = false;
...
}

Isn't it better to immediately return false if any unexpected WAL is
found? This will avoid reading unnecessary WAL

IIUC we can exit the loop of the result == false, so we do not have to read
unnecessary WALs. See the condition below. I used the approach because
private_data and xlogreader should be pfree()'d as cleanup.

```
/* Loop until all WALs are read, or unexpected record is found */
while (result && ReadNextXLogRecord(xlogreader))
{
```

3.
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
...
+
+ CHECK_IS_BINARY_UPGRADE;
+
+ /* Quick exit if the given lsn is larger than current one */
+ if (start_lsn >= curr_lsn)
+ PG_RETURN_BOOL(true);
Why do you return true here? My understanding was if the first record
is not a shutdown checkpoint record then it should fail, if that is
not true then I think we need to explain the same in comments.

I wondered what should be because it is unexpected input for us (note that this
unction could be used only for upgrade purpose). But yes, initially read WAL must
be XLOG_SHUTDOWN_CHECKPOINT, so changed as you said.

Also, I did a self-reviewing again and reworded comments.

BTW, the 0002 ports some functions from pg_walinspect, it may be not elegant.
Coupling degree between core/extensions should be also lower. So I made another
patch which does not port anything and implements similar functionalities instead.
I called the patch 0003, but can be applied atop 0001 (not 0002). To make cfbot
happy, attached as txt file.
Could you please tell me which do you like 0002 or 0003?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v38-0003-Another-one-Reads-all-WAL-records-ahead-confirme.txttext/plain; name=v38-0003-Another-one-Reads-all-WAL-records-ahead-confirme.txtDownload

From d3fa36f3bc7f8ef4c0c541742ac8ad6d9eee5f09 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Sep 2023 06:01:40 +0000
Subject: [PATCH v38] Another one: Reads all WAL records ahead
 confirmed_flush_lsn

---
 doc/src/sgml/ref/pgupgrade.sgml               |   5 +-
 src/backend/utils/adt/pg_upgrade_support.c    | 130 ++++++++++++++++++
 src/bin/pg_upgrade/check.c                    |   8 +-
 src/bin/pg_upgrade/controldata.c              |  39 ------
 src/bin/pg_upgrade/info.c                     |   6 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   1 -
 .../t/003_logical_replication_slots.pl        |  20 +++
 src/include/catalog/pg_proc.dat               |   6 +
 8 files changed, 164 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
-       of all slots on the old cluster must be the same as the latest
-       checkpoint location. This ensures that all the data has been replicated
-       before the upgrade.
+       Old cluster has replicated all the changes replicated to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..340dc180be 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +37,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,122 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return true if we didn't find any unexpected WAL record, false otherwise.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * waslender exits. Moreover, the following types of records would be during
+ * the pg_upgrade --check, so they are ignored too.
+ *
+ *		- XLOG_CHECKPOINT_ONLINE
+ *		- XLOG_RUNNING_XACTS
+ *		- XLOG_FPI_FOR_HINT
+ *		- XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			result = true;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	XLogBeginRead(xlogreader, start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (result)
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+		char	   *errormsg;
+		XLogRecord *record;
+
+		CHECK_FOR_INTERRUPTS();
+
+		record = XLogReadRecord(xlogreader, &errormsg);
+
+		if (record == NULL)
+		{
+			ReadLocalXLogPageNoWaitPrivate *check_data;
+
+			/* return NULL, if end of WAL is reached */
+			check_data = (ReadLocalXLogPageNoWaitPrivate *)
+				xlogreader->private_data;
+
+			if (check_data->end_of_wal)
+				break;
+
+			if (errormsg)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						errmsg("could not read WAL at %X/%X: %s",
+								LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+			else
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						errmsg("could not read WAL at %X/%X",
+								LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+		}
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				result = false;
+
+			initial_record = false;
+
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+				result = false;
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(result);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b1424fdf9c..df1ce67fc0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1480,8 +1480,8 @@ check_new_cluster_logical_replication_slots(void)
  * Following points are checked:
  *
  *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except a
- *	  CHECKPOINT_SHUTDOWN record.
+ *	- All logical replication slots consumed all WALs, except some acceptable
+ *	  types.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1521,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that confirmed_flush LSN of all
-			 * the slots is the same as the latest checkpoint location.
+			 * Do additional checks to ensure that all logical replication
+			 * slots have reached the current WAL position.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index f8f823e2be..4beb65ab22 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,45 +169,6 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
-
-			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
-			{
-				/*
-				 * Read the latest checkpoint location if the cluster is PG17
-				 * or later. This is used for upgrading logical replication
-				 * slots. Currently, we need it only for the old cluster but
-				 * for simplicity chose not to have additional checks.
-				 */
-				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-				{
-					char	   *slash = NULL;
-					uint32		upper_lsn,
-								lower_lsn;
-
-					p = strchr(p, ':');
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					p++;		/* remove ':' char */
-
-					p = strpbrk(p, "01234567890ABCDEF");
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					/*
-					 * The upper and lower part of LSN must be read separately
-					 * because it is stored as in %X/%X format.
-					 */
-					upper_lsn = strtoul(p, &slash, 16);
-					lower_lsn = strtoul(++slash, NULL, 16);
-
-					/* And combine them */
-					cluster->controldata.chkpnt_latest =
-						((uint64) upper_lsn << 32) | lower_lsn;
-				}
-			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f7b0deca87..5d25d1604e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -647,12 +647,12 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
 	 * removed.
 	 */
 	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
-							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"pg_catalog.binary_upgrade_validate_wal_record_types_after_lsn(confirmed_flush_lsn) as caught_up, "
+							"conflicting as invalid "
 							"FROM pg_catalog.pg_replication_slots "
 							"WHERE slot_type = 'logical' AND "
 							"database = current_database() AND "
-							"temporary IS FALSE;",
-							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+							"temporary IS FALSE;");
 
 	num_slots = PQntuples(res);
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f5ce6c3b4d..8a7f56831e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -246,7 +246,6 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
-	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 01cb04ca12..b91fb2f88f 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -169,6 +169,26 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
+# Dry run, successful check is expected. This is not live check, so shutdown
+# checkpoint record would be inserted. We want to test that
+# binary_upgrade_validate_wal_record_types_after_lsn() skips the WAL and then
+# upcoming pg_upgrade would succeed.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f3d843222b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_record_types_after_lsn',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_record_types_after_lsn' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 705cbb3787f4950cd7ec894ff267301935820d37 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v38 1/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  79 ++++++-
 src/backend/replication/slot.c                |  12 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |  12 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 836 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..4e2281bae4 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,80 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location. This ensures that all the data has been replicated
+       before the upgrade.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +726,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b1424fdf9c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,164 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..f7b0deca87 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..f5ce6c3b4d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..20589e8c43 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,20 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  This setting prevents the invalidation
+	 * of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..01cb04ca12
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f3d8a2a855..8e5ff87dff 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patchapplication/octet-stream; name=v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patchDownload

From 0adfe1819a1fc30c04210d31e7d780192a40732a Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Sep 2023 06:01:40 +0000
Subject: [PATCH v38 2/2] Use
 binary_upgrade_validate_wal_record_types_after_lsn

---
 contrib/pg_walinspect/pg_walinspect.c         | 94 -------------------
 doc/src/sgml/ref/pgupgrade.sgml               |  5 +-
 src/backend/access/transam/xlogutils.c        | 92 ++++++++++++++++++
 src/backend/utils/adt/pg_upgrade_support.c    | 86 +++++++++++++++++
 src/bin/pg_upgrade/check.c                    |  8 +-
 src/bin/pg_upgrade/controldata.c              | 39 --------
 src/bin/pg_upgrade/info.c                     |  6 +-
 src/bin/pg_upgrade/pg_upgrade.h               |  1 -
 .../t/003_logical_replication_slots.pl        | 20 ++++
 src/include/access/xlogutils.h                |  3 +
 src/include/catalog/pg_proc.dat               |  6 ++
 11 files changed, 215 insertions(+), 145 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
-       of all slots on the old cluster must be the same as the latest
-       checkpoint location. This ensures that all the data has been replicated
-       before the upgrade.
+       Old cluster has replicated all the changes replicated to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..c4b2980a8a 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +37,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,78 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return true if we didn't find any unexpected WAL record, false otherwise.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * waslender exits. Moreover, the following types of records would be during
+ * the pg_upgrade --check, so they are ignored too.
+ *
+ *             - XLOG_CHECKPOINT_ONLINE
+ *             - XLOG_RUNNING_XACTS
+ *             - XLOG_FPI_FOR_HINT
+ *             - XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			result = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (result && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				result = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			result = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(result);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b1424fdf9c..df1ce67fc0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1480,8 +1480,8 @@ check_new_cluster_logical_replication_slots(void)
  * Following points are checked:
  *
  *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except a
- *	  CHECKPOINT_SHUTDOWN record.
+ *	- All logical replication slots consumed all WALs, except some acceptable
+ *	  types.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1521,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that confirmed_flush LSN of all
-			 * the slots is the same as the latest checkpoint location.
+			 * Do additional checks to ensure that all logical replication
+			 * slots have reached the current WAL position.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index f8f823e2be..4beb65ab22 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,45 +169,6 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
-
-			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
-			{
-				/*
-				 * Read the latest checkpoint location if the cluster is PG17
-				 * or later. This is used for upgrading logical replication
-				 * slots. Currently, we need it only for the old cluster but
-				 * for simplicity chose not to have additional checks.
-				 */
-				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-				{
-					char	   *slash = NULL;
-					uint32		upper_lsn,
-								lower_lsn;
-
-					p = strchr(p, ':');
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					p++;		/* remove ':' char */
-
-					p = strpbrk(p, "01234567890ABCDEF");
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					/*
-					 * The upper and lower part of LSN must be read separately
-					 * because it is stored as in %X/%X format.
-					 */
-					upper_lsn = strtoul(p, &slash, 16);
-					lower_lsn = strtoul(++slash, NULL, 16);
-
-					/* And combine them */
-					cluster->controldata.chkpnt_latest =
-						((uint64) upper_lsn << 32) | lower_lsn;
-				}
-			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f7b0deca87..5d25d1604e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -647,12 +647,12 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
 	 * removed.
 	 */
 	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
-							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"pg_catalog.binary_upgrade_validate_wal_record_types_after_lsn(confirmed_flush_lsn) as caught_up, "
+							"conflicting as invalid "
 							"FROM pg_catalog.pg_replication_slots "
 							"WHERE slot_type = 'logical' AND "
 							"database = current_database() AND "
-							"temporary IS FALSE;",
-							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+							"temporary IS FALSE;");
 
 	num_slots = PQntuples(res);
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f5ce6c3b4d..8a7f56831e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -246,7 +246,6 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
-	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 01cb04ca12..b91fb2f88f 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -169,6 +169,26 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
+# Dry run, successful check is expected. This is not live check, so shutdown
+# checkpoint record would be inserted. We want to test that
+# binary_upgrade_validate_wal_record_types_after_lsn() skips the WAL and then
+# upcoming pg_upgrade would succeed.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f3d843222b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_record_types_after_lsn',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_record_types_after_lsn' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

#243

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#242)

3 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Thank you for reviewing! PSA new version patch set.

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated, when GUC
parameters are changed just before doing the upgrade. Added to list.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 705cbb3787f4950cd7ec894ff267301935820d37 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v38 1/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 doc/src/sgml/ref/pgupgrade.sgml               |  79 ++++++-
 src/backend/replication/slot.c                |  12 +
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 +++++++++++++++--
 src/bin/pg_upgrade/controldata.c              |  39 ++++
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 148 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  26 ++-
 src/bin/pg_upgrade/server.c                   |  12 +-
 .../t/003_logical_replication_slots.pl        | 214 ++++++++++++++++++
 src/tools/pgindent/typedefs.list              |   3 +
 13 files changed, 836 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..4e2281bae4 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,80 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location. This ensures that all the data has been replicated
+       before the upgrade.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +726,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b1424fdf9c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,164 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Make sure there are no logical replication slots on the new cluster and that
+ * the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Make sure logical replication slots can be migrated to new cluster.
+ * Following points are checked:
+ *
+ *	- All logical replication slots are usable.
+ *	- All logical replication slots consumed all WALs, except a
+ *	  CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional checks to ensure that confirmed_flush LSN of all
+			 * the slots is the same as the latest checkpoint location.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4beb65ab22..f8f823e2be 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,6 +169,45 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
+
+			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+			{
+				/*
+				 * Read the latest checkpoint location if the cluster is PG17
+				 * or later. This is used for upgrading logical replication
+				 * slots. Currently, we need it only for the old cluster but
+				 * for simplicity chose not to have additional checks.
+				 */
+				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+				{
+					char	   *slash = NULL;
+					uint32		upper_lsn,
+								lower_lsn;
+
+					p = strchr(p, ':');
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					p++;		/* remove ':' char */
+
+					p = strpbrk(p, "01234567890ABCDEF");
+
+					if (p == NULL || strlen(p) <= 1)
+						pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+					/*
+					 * The upper and lower part of LSN must be read separately
+					 * because it is stored as in %X/%X format.
+					 */
+					upper_lsn = strtoul(p, &slash, 16);
+					lower_lsn = strtoul(++slash, NULL, 16);
+
+					/* And combine them */
+					cluster->controldata.chkpnt_latest =
+						((uint64) upper_lsn << 32) | lower_lsn;
+				}
+			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..f7b0deca87 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo);
 
 
 /*
@@ -266,13 +268,13 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster)
 {
 	int			dbnum;
 
@@ -283,7 +285,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,107 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +755,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +776,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..6be236dc9a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..f5ce6c3b4d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -225,6 +246,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
@@ -345,7 +367,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +421,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..20589e8c43 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,20 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  This setting prevents the invalidation
+	 * of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..01cb04ca12
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,214 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old
+#	 cluster, so the new cluster config  max_replication_slots=1 will now be
+#	 enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f3d8a2a855..8e5ff87dff 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patchapplication/octet-stream; name=v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patchDownload

From 2aa3b8c6e71f5c3a73c883f20dd5909b1b96228c Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Sep 2023 06:01:40 +0000
Subject: [PATCH v38 2/2] Use
 binary_upgrade_validate_wal_record_types_after_lsn

---
 contrib/pg_walinspect/pg_walinspect.c         | 94 -------------------
 doc/src/sgml/ref/pgupgrade.sgml               |  5 +-
 src/backend/access/transam/xlogutils.c        | 92 ++++++++++++++++++
 src/backend/utils/adt/pg_upgrade_support.c    | 86 +++++++++++++++++
 src/bin/pg_upgrade/check.c                    |  8 +-
 src/bin/pg_upgrade/controldata.c              | 39 --------
 src/bin/pg_upgrade/info.c                     |  6 +-
 src/bin/pg_upgrade/pg_upgrade.h               |  1 -
 .../t/003_logical_replication_slots.pl        | 20 ++++
 src/include/access/xlogutils.h                |  3 +
 src/include/catalog/pg_proc.dat               |  6 ++
 11 files changed, 215 insertions(+), 145 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
-       of all slots on the old cluster must be the same as the latest
-       checkpoint location. This ensures that all the data has been replicated
-       before the upgrade.
+       Old cluster has replicated all the changes replicated to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..c4b2980a8a 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +37,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,78 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return true if we didn't find any unexpected WAL record, false otherwise.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * waslender exits. Moreover, the following types of records would be during
+ * the pg_upgrade --check, so they are ignored too.
+ *
+ *             - XLOG_CHECKPOINT_ONLINE
+ *             - XLOG_RUNNING_XACTS
+ *             - XLOG_FPI_FOR_HINT
+ *             - XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			result = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (result && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				result = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			result = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(result);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b1424fdf9c..df1ce67fc0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1480,8 +1480,8 @@ check_new_cluster_logical_replication_slots(void)
  * Following points are checked:
  *
  *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except a
- *	  CHECKPOINT_SHUTDOWN record.
+ *	- All logical replication slots consumed all WALs, except some acceptable
+ *	  types.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1521,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that confirmed_flush LSN of all
-			 * the slots is the same as the latest checkpoint location.
+			 * Do additional checks to ensure that all logical replication
+			 * slots have reached the current WAL position.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index f8f823e2be..4beb65ab22 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,45 +169,6 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
-
-			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
-			{
-				/*
-				 * Read the latest checkpoint location if the cluster is PG17
-				 * or later. This is used for upgrading logical replication
-				 * slots. Currently, we need it only for the old cluster but
-				 * for simplicity chose not to have additional checks.
-				 */
-				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-				{
-					char	   *slash = NULL;
-					uint32		upper_lsn,
-								lower_lsn;
-
-					p = strchr(p, ':');
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					p++;		/* remove ':' char */
-
-					p = strpbrk(p, "01234567890ABCDEF");
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					/*
-					 * The upper and lower part of LSN must be read separately
-					 * because it is stored as in %X/%X format.
-					 */
-					upper_lsn = strtoul(p, &slash, 16);
-					lower_lsn = strtoul(++slash, NULL, 16);
-
-					/* And combine them */
-					cluster->controldata.chkpnt_latest =
-						((uint64) upper_lsn << 32) | lower_lsn;
-				}
-			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f7b0deca87..5d25d1604e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -647,12 +647,12 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
 	 * removed.
 	 */
 	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
-							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"pg_catalog.binary_upgrade_validate_wal_record_types_after_lsn(confirmed_flush_lsn) as caught_up, "
+							"conflicting as invalid "
 							"FROM pg_catalog.pg_replication_slots "
 							"WHERE slot_type = 'logical' AND "
 							"database = current_database() AND "
-							"temporary IS FALSE;",
-							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+							"temporary IS FALSE;");
 
 	num_slots = PQntuples(res);
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f5ce6c3b4d..8a7f56831e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -246,7 +246,6 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
-	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 01cb04ca12..b91fb2f88f 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -169,6 +169,26 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
+# Dry run, successful check is expected. This is not live check, so shutdown
+# checkpoint record would be inserted. We want to test that
+# binary_upgrade_validate_wal_record_types_after_lsn() skips the WAL and then
+# upcoming pg_upgrade would succeed.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f3d843222b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_record_types_after_lsn',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_record_types_after_lsn' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

v38-0003-Another-one-Reads-all-WAL-records-ahead-confirme.txttext/plain; name=v38-0003-Another-one-Reads-all-WAL-records-ahead-confirme.txtDownload

From ff250bb4bd467aaf8b09a27ab9edc93a3d9bb9bc Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Sep 2023 06:01:40 +0000
Subject: [PATCH v38] Another one: Reads all WAL records ahead
 confirmed_flush_lsn

---
 doc/src/sgml/ref/pgupgrade.sgml               |   5 +-
 src/backend/utils/adt/pg_upgrade_support.c    | 133 ++++++++++++++++++
 src/bin/pg_upgrade/check.c                    |   8 +-
 src/bin/pg_upgrade/controldata.c              |  39 -----
 src/bin/pg_upgrade/info.c                     |   6 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   1 -
 .../t/003_logical_replication_slots.pl        |  20 +++
 src/include/catalog/pg_proc.dat               |   6 +
 8 files changed, 167 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
-       of all slots on the old cluster must be the same as the latest
-       checkpoint location. This ensures that all the data has been replicated
-       before the upgrade.
+       Old cluster has replicated all the changes replicated to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..5b58769b5e 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +37,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,125 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return true if we didn't find any unexpected WAL record, false otherwise.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * waslender exits. XLOG_PARAMETER_CHANGE is also ignored because it would be
+ * inserted when GUC parameters are changed just before doing the upgrade.
+ * Moreover, the following types of records would be during
+ * the pg_upgrade --check, so they are ignored too.
+ *
+ *		- XLOG_CHECKPOINT_ONLINE
+ *		- XLOG_RUNNING_XACTS
+ *		- XLOG_FPI_FOR_HINT
+ *		- XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			result = true;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	XLogBeginRead(xlogreader, start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (result)
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+		char	   *errormsg;
+		XLogRecord *record;
+
+		CHECK_FOR_INTERRUPTS();
+
+		record = XLogReadRecord(xlogreader, &errormsg);
+
+		if (record == NULL)
+		{
+			ReadLocalXLogPageNoWaitPrivate *check_data;
+
+			/* return NULL, if end of WAL is reached */
+			check_data = (ReadLocalXLogPageNoWaitPrivate *)
+				xlogreader->private_data;
+
+			if (check_data->end_of_wal)
+				break;
+
+			if (errormsg)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						errmsg("could not read WAL at %X/%X: %s",
+								LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+			else
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						errmsg("could not read WAL at %X/%X",
+								LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+		}
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				result = false;
+
+			initial_record = false;
+
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+				result = false;
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(result);
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b1424fdf9c..df1ce67fc0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1480,8 +1480,8 @@ check_new_cluster_logical_replication_slots(void)
  * Following points are checked:
  *
  *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except a
- *	  CHECKPOINT_SHUTDOWN record.
+ *	- All logical replication slots consumed all WALs, except some acceptable
+ *	  types.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1521,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that confirmed_flush LSN of all
-			 * the slots is the same as the latest checkpoint location.
+			 * Do additional checks to ensure that all logical replication
+			 * slots have reached the current WAL position.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index f8f823e2be..4beb65ab22 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -169,45 +169,6 @@ get_control_data(ClusterInfo *cluster, bool live_check)
 				}
 				got_cluster_state = true;
 			}
-
-			else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
-			{
-				/*
-				 * Read the latest checkpoint location if the cluster is PG17
-				 * or later. This is used for upgrading logical replication
-				 * slots. Currently, we need it only for the old cluster but
-				 * for simplicity chose not to have additional checks.
-				 */
-				if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-				{
-					char	   *slash = NULL;
-					uint32		upper_lsn,
-								lower_lsn;
-
-					p = strchr(p, ':');
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					p++;		/* remove ':' char */
-
-					p = strpbrk(p, "01234567890ABCDEF");
-
-					if (p == NULL || strlen(p) <= 1)
-						pg_fatal("%d: controldata retrieval problem", __LINE__);
-
-					/*
-					 * The upper and lower part of LSN must be read separately
-					 * because it is stored as in %X/%X format.
-					 */
-					upper_lsn = strtoul(p, &slash, 16);
-					lower_lsn = strtoul(++slash, NULL, 16);
-
-					/* And combine them */
-					cluster->controldata.chkpnt_latest =
-						((uint64) upper_lsn << 32) | lower_lsn;
-				}
-			}
 		}
 
 		rc = pclose(output);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f7b0deca87..5d25d1604e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -647,12 +647,12 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo)
 	 * removed.
 	 */
 	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
-							"(confirmed_flush_lsn = '%X/%X') as caught_up, conflicting as invalid "
+							"pg_catalog.binary_upgrade_validate_wal_record_types_after_lsn(confirmed_flush_lsn) as caught_up, "
+							"conflicting as invalid "
 							"FROM pg_catalog.pg_replication_slots "
 							"WHERE slot_type = 'logical' AND "
 							"database = current_database() AND "
-							"temporary IS FALSE;",
-							LSN_FORMAT_ARGS(old_cluster.controldata.chkpnt_latest));
+							"temporary IS FALSE;");
 
 	num_slots = PQntuples(res);
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f5ce6c3b4d..8a7f56831e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -246,7 +246,6 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
-	XLogRecPtr	chkpnt_latest;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
index 01cb04ca12..b91fb2f88f 100644
--- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -169,6 +169,26 @@ $subscriber->wait_for_subscription_sync($old_publisher, 'sub');
 $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
 $old_publisher->stop;
 
+# Dry run, successful check is expected. This is not live check, so shutdown
+# checkpoint record would be inserted. We want to test that
+# binary_upgrade_validate_wal_record_types_after_lsn() skips the WAL and then
+# upcoming pg_upgrade would succeed.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 # Actual run, successful upgrade is expected
 command_ok(
 	[
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f3d843222b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_record_types_after_lsn',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_record_types_after_lsn' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

#244

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#242)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, September 15, 2023 8:33 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Also, I did a self-reviewing again and reworded comments.

BTW, the 0002 ports some functions from pg_walinspect, it may be not
elegant.
Coupling degree between core/extensions should be also lower. So I made
another patch which does not port anything and implements similar
functionalities instead.
I called the patch 0003, but can be applied atop 0001 (not 0002). To make cfbot
happy, attached as txt file.
Could you please tell me which do you like 0002 or 0003?

I think basically it's OK that we follow the same method as pg_walinspect to
read the WAL. The reasons are as follows:

There are currently two set of APIs that are used to read WALs.
a) XLogReaderAllocate()/XLogReadRecord() -- pg_walinspect and current patch uses
b) XLogReaderAllocate()/WALRead()

The first setup APIs is easier to use and are used in most of WAL reading
codes, while the second set of APIs is used more in low level places and is not
very easy to use. So I think it's better to use the first set of APIs.

Besides, our function needs to distinguish the failure and end-of-wal cases
when XLogReadRecord() returns NULL and to read the wal without waiting. So, the
WAL reader callbacks in pg_walinspect also meets this requirement which is reason that
I think we can follow the same. I also checked other public wal reader callbacks but
they either report ERRORs if XLogReadRecord() returns NULL or will wait while
reading wals.

If we agree to follow the same method of pg_walinspect, I think the left
thing is whether to port some functions like what 0002. I personally
think it's fine to make common functions to save codes.

Best Regards,
Hou zj

#245

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#243)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, September 15, 2023 9:02 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated,
when GUC parameters are changed just before doing the upgrade. Added to
list.

I did some simple performance tests for the patch just to make sure it doesn't
introduce obvious overhead, the result looks good to me. I tested two cases:

1) The time for upgrade when the old db has 0, 10,50, 100 slots
0 slots(HEAD) : 0m5.585s
0 slots : 0m5.591s
10 slots : 0m5.602s
50 slots : 0m5.636s
100 slots : 0m5.778s

2) The time for upgrade after doing "upgrade --check" in advance, when
the old db has 0, 10,50, 100 slots.

0 slots(HEAD) : 0m5.588s
0 slots : 0m5.596s
10 slots : 0m5.605s
50 slots : 0m5.737s
100 slots : 0m5.783s

The data of the local machine I used is:
CPU(s): 40
Model name: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
Core(s) per socket: 10
Socket(s): 2
memory: 125GB
disk: 6T HDD

The old database is empty except for the slots in both tests.

The test script is also attached for reference(run perf.sh after
adjusting other settings.)

Best Regards,
Hou zj

#246

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#243)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 15, 2023 at 6:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version patch set.

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated, when GUC
parameters are changed just before doing the upgrade. Added to list.

You forgot to update 0002 patch for XLOG_PARAMETER_CHANGE. I think it
is okay to move walinspect's functionality into common place so that
it can be used by this patch as suggested by Hou-San. The only reason
it is okay to keep it specific to walinspect is if we want to enhance
that functions for walinspect but I think if that happens then we can
evaluate whether to enhance it by having additional parameters or
creating something specific for walinspect.

* +Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)

How about naming it as binary_upgrade_validate_wal_records()? I don't
see it is helpful to make it too long.

Apart from this, I have made minor cosmetic changes in the attached.
If these looks okay to you then you can include them in next version.

--
With Regards,
Amit Kapila.

Attachments:

changes_amit_1.txttext/plain; charset=US-ASCII; name=changes_amit_1.txtDownload

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 2588d6d7b8..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,7 +418,7 @@ make prefix=/usr/local/pgsql.new install
      </listitem>
      <listitem>
       <para>
-       Old cluster has replicated all the changes replicated to subscribers.
+       The old cluster has replicated all the changes to subscribers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index c4b2980a8a..2914b2833e 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -12,9 +12,7 @@
 #include "postgres.h"
 
 #include "access/heapam_xlog.h"
-#include "access/rmgr.h"
 #include "access/xlog.h"
-#include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
@@ -279,17 +277,13 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
  * This function is used to verify that there are no WAL records (except some
  * types) after confirmed_flush_lsn of logical slots, which means all the
  * changes were replicated to the subscriber. There is a possibility that some
- * WALs are inserted after logical waslenders exit, so such types would be
- * ignored.
+ * WALs are inserted during upgrade, so such types would be ignored.
  *
  * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
- * waslender exits. Moreover, the following types of records would be during
- * the pg_upgrade --check, so they are ignored too.
- *
- *             - XLOG_CHECKPOINT_ONLINE
- *             - XLOG_RUNNING_XACTS
- *             - XLOG_FPI_FOR_HINT
- *             - XLOG_HEAP2_PRUNE
+ * waslender exits. Moreover, the following types of records could be generated
+ * during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
  */
 Datum
 binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index df1ce67fc0..37c75bd024 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1399,8 +1399,8 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 /*
  * check_new_cluster_logical_replication_slots()
  *
- * Make sure there are no logical replication slots on the new cluster and that
- * the parameter settings necessary for creating slots are sufficient.
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
  */
 static void
 check_new_cluster_logical_replication_slots(void)
@@ -1476,12 +1476,8 @@ check_new_cluster_logical_replication_slots(void)
 /*
  * check_old_cluster_for_valid_slots()
  *
- * Make sure logical replication slots can be migrated to new cluster.
- * Following points are checked:
- *
- *	- All logical replication slots are usable.
- *	- All logical replication slots consumed all WALs, except some acceptable
- *	  types.
+ * Verify that all the logical slots are usable and consumed all the WAL
+ * before shutdown.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1521,8 +1517,8 @@ check_old_cluster_for_valid_slots(bool live_check)
 			}
 
 			/*
-			 * Do additional checks to ensure that all logical replication
-			 * slots have reached the current WAL position.
+			 * Do additional check to ensure that all logical replication slots
+			 * have consumed all the WAL before shutdown.
 			 *
 			 * Note: This can be satisfied only when the old cluster has been
 			 * shut down, so we skip this for live checks.

#247

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#246)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! PSA new version!

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated,

when GUC

parameters are changed just before doing the upgrade. Added to list.

You forgot to update 0002 patch for XLOG_PARAMETER_CHANGE.

Oh, I did wrong git operations locally. Sorry for inconvenience.

I think it
is okay to move walinspect's functionality into common place so that
it can be used by this patch as suggested by Hou-San. The only reason
it is okay to keep it specific to walinspect is if we want to enhance
that functions for walinspect but I think if that happens then we can
evaluate whether to enhance it by having additional parameters or
creating something specific for walinspect.

OK, merged 0001 + 0002 into one.

* +Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)

How about naming it as binary_upgrade_validate_wal_records()? I don't
see it is helpful to make it too long.

Agreed, fixed.

Apart from this, I have made minor cosmetic changes in the attached.
If these looks okay to you then you can include them in next version.

Seems better, included.

Apart from above, I fixed not to call binary_upgrade_validate_wal_records() during
the live check, because it raises ERROR if the server is not in the upgrade. The
result would be used only when not in the live check mode, so it's OK to skip.
Also, some comments were slightly reworded.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v39-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patchapplication/octet-stream; name=v39-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patchDownload

From ef84e780a36b27c525ed13ce27b3facd6cfc27e2 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v390] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogutils.c        |  92 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  81 ++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 198 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 156 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  25 +-
 src/bin/pg_upgrade/server.c                   |  12 +-
 .../t/003_logical_replication_slots.pl        | 232 ++++++++++++++++++
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/tools/pgindent/typedefs.list              |   3 +
 17 files changed, 997 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..0ee5c8fdff 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +35,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +270,75 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * walsender exits. Moreover, the following types of records could be generated
+ * during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				is_valid = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * XXX: There is a possibility that following records may be
+		 * generated during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			is_valid = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..c45d84dd1a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,160 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication slots
+			 * have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..9f264bedff 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,113 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. Note that we can't ensure whether the slot
+	 * is caught up during live_check as a new WAL could be generated.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"pg_catalog.binary_upgrade_validate_wal_records(confirmed_flush_lsn)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +763,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +784,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..aac712d49b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>
 
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
 
@@ -150,6 +151,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +196,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +366,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +420,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..20589e8c43 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,20 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  This setting prevents the invalidation
+	 * of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..8f15e97257 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..ff6cd495a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#248

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#247)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Sep 19, 2023 at 11:47 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! PSA new version!

*
+#include "access/xlogdefs.h"
#include "common/relpath.h"
#include "libpq-fe.h"

The above include is not required. I have removed that and made a few
cosmetic changes in the attached.

--
With Regards,
Amit Kapila.

Attachments:

v40-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v40-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 73b203cfabb3959efb24bca309206aee4436a021 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v40] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogutils.c        |  92 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  81 ++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 198 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 157 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  12 +-
 .../t/003_logical_replication_slots.pl        | 232 ++++++++++++++++++
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/tools/pgindent/typedefs.list              |   3 +
 17 files changed, 997 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..5bd8441281 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +35,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +270,75 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * walsender exits. Moreover, the following types of records could be generated
+ * during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	  start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool			initial_record = true;
+	bool			is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		   rmid;
+		uint8		   info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+								  XLOG_CHECKPOINT_SHUTDOWN))
+				is_valid = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			is_valid = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..c45d84dd1a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,160 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication slots
+			 * have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..82da3325a2 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,114 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. Note that we can't ensure whether the slot
+	 * is caught up during live_check as the new WAL records could be
+	 * generated.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"pg_catalog.binary_upgrade_validate_wal_records(confirmed_flush_lsn)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +764,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +785,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..b2531862e4 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..20589e8c43 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -234,14 +234,20 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
+	 *
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots are
+	 * removed, the slots are unusable.  This setting prevents the invalidation
+	 * of slots during the upgrade.
 	 */
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
+			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			(cluster == &new_cluster) ?
+			" -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			" -c max_slot_wal_keep_size=-1",
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
 	/*
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..8f15e97257 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..ff6cd495a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.28.0.windows.1

#249

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#248)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! PSA new version. In this version I ran pgindent again.

+#include "access/xlogdefs.h"
#include "common/relpath.h"
#include "libpq-fe.h"

The above include is not required. I have removed that and made a few
cosmetic changes in the attached.

Yes, it is not needed anymore. Firstly it was introduced to use the datatype
XLogRecPtr, but removed in recent version.

Moreover, I my colleague Hou found several problems for v40. Here is a fixed
version. Below bullets are the found issues.

* Fixed to allow XLOG_SWICH when reading the record, including the initial one.
The XLOG_SWICH may inserted after walsender exits. This is occurred when
archive_mode is set to on (or always).
* Fixed to set max_slot_wal_keep_size -1 only when the cluster is PG17+.
max_slot_wal_keep_size was introduced in PG13, so previous patch could not
upgrade from PG12 and prior.
The setting is only needed to upgrade logical slots, so it should be set only
when in PG17 and later.
* Avoid to call binary_upgrade_validate_wal_records() when the slot is invalidated.
The function raises an ERROR if the record corresponds to the given LSN.
The output is like:

```
ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed
```

It is usual behavior but we do not want to error out here, so it was avoided.
The upgrading would fail correctly if there are invalid slots.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v41-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v41-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From c303e2475934b8989506d8f5e6b118f1815552e4 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v41] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogutils.c        |  92 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  85 +++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 198 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 160 +++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../t/003_logical_replication_slots.pl        | 232 ++++++++++++++++++
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/tools/pgindent/typedefs.list              |   3 +
 17 files changed, 1017 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..fcee9a76e5 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +35,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +270,79 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+				!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH))
+				is_valid = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			is_valid = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..1bb3873ace 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,160 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..75f6dd4dc2 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,117 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. Note that we can't ensure whether the
+	 * slot is caught up during live_check as the new WAL records could be
+	 * generated and we intentionally skip checking the WALs for invalidated
+	 * slots as the corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_records(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +767,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +788,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..b2531862e4 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..8f15e97257 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,12 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..ff6cd495a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#250

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#249)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! PSA new version. In this version I ran pgindent again.

+ /*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ is_valid = false;
+
+ CHECK_FOR_INTERRUPTS();

Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
can not occur during the upgrade?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#251

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#250)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! PSA new version. In this version I ran pgindent again.
+ /*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ is_valid = false;
+
+ CHECK_FOR_INTERRUPTS();
Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
can not occur during the upgrade?

Because autovacuum is disabled during upgrade. See comment: "Use -b to
disable autovacuum" in start_postmaster().

--
With Regards,
Amit Kapila.

#252

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#249)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

+int
+count_old_cluster_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

In this code, aren't we assuming that 'slot_arr.nslots' will be zero
for versions <=PG16? On my Windows machine, this value is not zero but
rather some uninitialized negative value which makes its caller try to
allocate some undefined memory and fail. I think you need to
initialize this in get_old_cluster_logical_slot_infos() for lower
versions.

--
With Regards,
Amit Kapila.

#253

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#251)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 12:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 20, 2023 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,

Thank you for reviewing! PSA new version. In this version I ran pgindent again.
+ /*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ is_valid = false;
+
+ CHECK_FOR_INTERRUPTS();
Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
can not occur during the upgrade?
Because autovacuum is disabled during upgrade. See comment: "Use -b to
disable autovacuum" in start_postmaster().

Okay got it, thanks.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#254

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Amit Kapila (#252)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 12:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Amit,
+int
+count_old_cluster_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}
In this code, aren't we assuming that 'slot_arr.nslots' will be zero
for versions <=PG16? On my Windows machine, this value is not zero but
rather some uninitialized negative value which makes its caller try to
allocate some undefined memory and fail. I think you need to
initialize this in get_old_cluster_logical_slot_infos() for lower
versions.

+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },

In this many of the fields seem bogus. For example, we don't need
prorows => '10', proretset => 't' for this function. Similarly
proargmodes also look incorrect as we don't have any out parameter.

--
With Regards,
Amit Kapila.

#255

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#252)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

+int
+count_old_cluster_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}
In this code, aren't we assuming that 'slot_arr.nslots' will be zero
for versions <=PG16? On my Windows machine, this value is not zero but
rather some uninitialized negative value which makes its caller try to
allocate some undefined memory and fail. I think you need to
initialize this in get_old_cluster_logical_slot_infos() for lower
versions.

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v42-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v42-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 5a29bc4ca9e7a20ca2b4b9445edabbaa455f06ab Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v42] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogutils.c        |  92 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  88 +++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 198 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 164 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../t/003_logical_replication_slots.pl        | 232 ++++++++++++++++++
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/tools/pgindent/typedefs.list              |   3 +
 17 files changed, 1023 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..26351a1e1c 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -29,6 +35,9 @@ do {															\
 				 errmsg("function can only be called when server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+	(rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +270,82 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+				!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH))
+				is_valid = false;
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+			!CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+			is_valid = false;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..1bb3873ace 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,160 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..1baa94c49c 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,121 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. Note that we can't ensure whether the
+	 * slot is caught up during live_check as the new WAL records could be
+	 * generated and we intentionally skip checking the WALs for invalidated
+	 * slots as the corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_records(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +771,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +792,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..b2531862e4 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* If true, the slot is unusable. */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..3162809888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_records' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..ff6cd495a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#256

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#254)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! New version can be available in [1]/messages/by-id/TYAPR01MB586615579356A84A8CF29A00F5F9A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },
In this many of the fields seem bogus. For example, we don't need
prorows => '10', proretset => 't' for this function. Similarly
proargmodes also look incorrect as we don't have any out parameter.

The part was made in old versions and has kept till now. I rechecked them and
changed like below:

* This function just returns boolean, proretset was changed to 'f'.
* Based on above, prorows should be zero. Removed.
* Returned value is quite depended on the internal status, provolatile was
changed to 'v'.
* There are no OUT and INOUT arguments, no need to set proallargtypes and proargmodes.
Removed.
* Anonymous arguments are allowed, proargnames was removed NULL.
* This function is not expected to be call in parallel. proparallel was set to 'u'.
* The argument must not be NULL, and we should error out. proisstrict was changed 'f'.
Also, the check was added to the function.

[1]: /messages/by-id/TYAPR01MB586615579356A84A8CF29A00F5F9A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#257

Michael Paquier

michael@paquier.xyz

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#255)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 11:28:33AM +0000, Hayato Kuroda (Fujitsu) wrote:

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

I am getting slowly up to date with this patch.. But before going in
depth with more review, there is something that I got to ask: why is
there no option to control if the slots are copied across the upgrade?
At least, I would have imagined that an option to disable the copy of
the slots would be adapted, say a --no-slot-copy or similar to get
back to the old behavior if need be.

+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk

Is this comment in get_old_cluster_logical_slot_infos() still true
after e0b2eed047d?
--
Michael

#258

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Michael Paquier (#257)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 21, 2023 at 1:10 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Sep 20, 2023 at 11:28:33AM +0000, Hayato Kuroda (Fujitsu) wrote:

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

I am getting slowly up to date with this patch.. But before going in
depth with more review, there is something that I got to ask: why is
there no option to control if the slots are copied across the upgrade?
At least, I would have imagined that an option to disable the copy of
the slots would be adapted, say a --no-slot-copy or similar to get
back to the old behavior if need be.

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
Is this comment in get_old_cluster_logical_slot_infos() still true
after e0b2eed047d?

Yes, we didn't backpatched it, so slots from pre-17 won't be flushed
at shutdown time even if required.

--
With Regards,
Amit Kapila.

#259

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#255)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hackers,

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

I did some cosmetic changes for the patch, the functionality was not changed.
E.g., a macro function was replaced to an inline.

Note that cfbot got angry to old patch, but it seemed the infrastructure-side
error. Let's see again.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v43-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patchapplication/octet-stream; name=v43-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patchDownload

From 70f0666d253988b20719a197281d0c2ca8078d2d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v430] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogutils.c        |  92 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  93 +++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 198 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 167 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../t/003_logical_replication_slots.pl        | 232 ++++++++++++++++++
 src/include/access/xlogutils.h                |   3 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/tools/pgindent/typedefs.list              |   3 +
 17 files changed, 1031 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 43f7b31205..e2cabfef32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1048,3 +1048,95 @@ WALReadRaiseError(WALReadError *errinfo)
 						errinfo->wre_req)));
 	}
 }
+
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..dba39b185e 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,90 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Helper function for binary_upgrade_validate_wal_records().
+ */
+static inline bool
+is_xlog_record_type(RmgrId rmgrid, uint8 info,
+					RmgrId expected_rmgrid, uint8 expected_info)
+{
+	return (rmgrid == expected_rmgrid) && (info == expected_info);
+}
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+				is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH);
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) ||
+			is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) ||
+			is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..2f5fd571ea 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,160 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("expected 0 logical replication slots but found %d",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW wal_level;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine wal_level");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not determine max_replication_slots");
+
+	max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..a452860617 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,124 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade.
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_records(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +774,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +795,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..228f29b688 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..fb7ee26569 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 5b77b11f50..1cf31aa24f 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -115,4 +115,7 @@ extern void XLogReadDetermineTimeline(XLogReaderState *state,
 
 extern void WALReadRaiseError(WALReadError *errinfo);
 
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..3162809888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_records' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..ff6cd495a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1502,7 +1502,10 @@ LogicalRepTupleData
 LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
+LogicalReplicationSlotInfo
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#260

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#255)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

Thanks for the patch. I have some comments on v42:

1.
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',

+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

Can proisstrict => 'f' be removed so that there's no need for explicit
PG_ARGISNULL check? Any specific reason to keep it?

And, the before the ISNULL check the arg is read, which isn't good.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)

The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?

3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+

An LSN that doesn't exists yet is an error IMO, may be an error better here?

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *

This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */

6.
+    if (PQntuples(res) != 1)
+        pg_fatal("could not count the number of logical replication slots");
+

Not existing a single logical replication slot an error? I think it
must be if (PQntuples(res) == 0) return;?

7. A nit:
+    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+    if (nslots_on_new)

Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

8.
+    if (nslots_on_new)
+        pg_fatal("expected 0 logical replication slots but found %d",
+                 nslots_on_new);

How about "New cluster database is containing logical replication
slots", note that the some of the fatal messages are starting with an
upper-case letter.

9.
+    res = executeQueryOrDie(conn, "SHOW wal_level;");
+    res = executeQueryOrDie(conn, "SHOW max_replication_slots;");

Instead of 2 queries to determine required parameters, isn't it better
with a single query like the following?

select setting from pg_settings where name in ('wal_level',
'max_replication_slots') order by name;

10.
Why just wal_level and max_replication_slots, why not
max_worker_processes and max_wal_senders too? I'm looking at
RecoveryRequiresIntParameter and if they are different on the upgraded
instance, chances that the logical replication won't work, no?

11.
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#     it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+    "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;

This might be a recipie for sporadic test failures - how is it
guaranteed that the newly generated WAL records aren't consumed.

May be stop subscriber or temporarily disable the subscription and
then generate WAL records?

12.
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+

Why not these functions be defined in xlogreader.h with elog/ereport
in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right
location for these functions.

13.
+LogicalReplicationSlotInfo

Where is this structure defined?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#261

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#260)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 21, 2023 at 4:57 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

Thanks for the patch. I have some comments on v42:
1.
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");
Can proisstrict => 'f' be removed so that there's no need for explicit
PG_ARGISNULL check? Any specific reason to keep it?

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

And, the before the ISNULL check the arg is read, which isn't good.

Right.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?

How about slightly modified version like
binary_upgrade_validate_wal_logical_end?

3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+
An LSN that doesn't exists yet is an error IMO, may be an error better here?

It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

Yeah, we should give information at the callsite but I guess we need
to give some context atop this function as well so that it is easier
to explain the functionality.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */

I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.

10.
Why just wal_level and max_replication_slots, why not
max_worker_processes and max_wal_senders too?

Isn't it sufficient to check the parameters that are required to
create a slot aka what we check in the function
CheckLogicalDecodingRequirements()? We are only creating logical slots
here so I think that should be sufficient.

--
With Regards,
Amit Kapila.

#262

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#260)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thank you for reviewing! Before addressing them, I would like to reply some comments.

6.
+    if (PQntuples(res) != 1)
+        pg_fatal("could not count the number of logical replication slots");
+
Not existing a single logical replication slot an error? I think it
must be if (PQntuples(res) == 0) return;?

The query executes "SELECT count(*)...", IIUC it exactly returns 1 row.

7. A nit:
+    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+    if (nslots_on_new)
Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

Note that the vaule would be used for upcoming pg_fatal. I prefer current style
because multiple atoi(PQgetvalue(res, 0, 0)) was not so beautiful.

11.
+# 2. Generate extra WAL records. Because these WAL records do not get
consumed
+#     it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+    "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
This might be a recipie for sporadic test failures - how is it
guaranteed that the newly generated WAL records aren't consumed.

You mentioned at line 118, but at that time logical replication system is not created.
The subscriber is created at line 163.
Therefore WALs would not be consumed automatically.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#263

Michael Paquier

michael@paquier.xyz

over 2 years ago

In reply to: Amit Kapila (#258)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

Okay. I am wondering if the subscriber part is OK now without an
option, but that could also be considered separately, as well. At
least I hope so.
--
Michael

#264

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#261)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for the patch. I have some comments on v42:

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

confirmed_flush_lsn for a logical slot can be null (for instance,
before confirmed_flush is updated for a newly created logical slot if
someone calls pg_stat_replication -> pg_get_replication_slots) and
when it is so, the binary_upgrade_create_empty_extension errors out.
Is this behaviour wanted? I think the function returning null on null
input is a better behaviour here.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?
How about slightly modified version like
binary_upgrade_validate_wal_logical_end?

Works for me.

3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+
An LSN that doesn't exists yet is an error IMO, may be an error better here?
It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.

I disagree with the function returning false for non-existing LSN.
IMO, failing fast when an LSN that doesn't exist yet is supplied to
the function is the right approach. We never know, the slots on disk
content can get corrupted for some reason and confirmed_flush_lsn is
'FFFFFFFF/FFFFFFFF' or a non-existing LSN.

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
Yeah, we should give information at the callsite but I guess we need
to give some context atop this function as well so that it is easier
to explain the functionality.

At the callsite a detailed description is good. At the function
definition just a reference to the callsite is good.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.

Can you help me understand how the list of WAL records that pg_upgrade
can generate is put up? Identified them after running some tests?

10.
Why just wal_level and max_replication_slots, why not
max_worker_processes and max_wal_senders too?

Isn't it sufficient to check the parameters that are required to
create a slot aka what we check in the function
CheckLogicalDecodingRequirements()? We are only creating logical slots
here so I think that should be sufficient.

Ah, that makes sense.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#265

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#262)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 21, 2023 at 6:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

6.
+    if (PQntuples(res) != 1)
+        pg_fatal("could not count the number of logical replication slots");
+
Not existing a single logical replication slot an error? I think it
must be if (PQntuples(res) == 0) return;?
The query executes "SELECT count(*)...", IIUC it exactly returns 1 row.

Ah, got it.

7. A nit:
+    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+    if (nslots_on_new)
Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?
Note that the vaule would be used for upcoming pg_fatal. I prefer current style
because multiple atoi(PQgetvalue(res, 0, 0)) was not so beautiful.

+1.

You mentioned at line 118, but at that time logical replication system is not created.
The subscriber is created at line 163.
Therefore WALs would not be consumed automatically.

So, not calling pg_logical_slot_get_changes() on test_slot1 won't
consume the WAL?

A few more comments:

1.
+    /*
+     * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+     * checkpointer process.  If WALs required by logical replication slots
+     * are removed, the slots are unusable.  This setting prevents the
+     * invalidation of slots during the upgrade. We set this option when

IIUC, during upgrade we don't want the checkpointer to remove WAL that
may be needed by logical slots, for that the patch overrides the user
set value for max_slot_wal_keep_size. What if the WAL is removed
because of the wal_keep_size setting?

2.
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl

How about a more descriptive and pointed name for the TAP test file,
something like 003_upgrade_logical_replication_slots.pl?

3. Does this patch support upgrading of logical replication slots on a
streaming standby? If yes, isn't it a good idea to add one test for
upgrading standby with logical replication slots?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#266

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#264)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 22, 2023 at 10:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+
An LSN that doesn't exists yet is an error IMO, may be an error better here?
It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.
I disagree with the function returning false for non-existing LSN.
IMO, failing fast when an LSN that doesn't exist yet is supplied to
the function is the right approach. We never know, the slots on disk
content can get corrupted for some reason and confirmed_flush_lsn is
'FFFFFFFF/FFFFFFFF' or a non-existing LSN.

I don't think it is big deal to either fail immediately or slightly
later with more information about slot. It could be better if we do
later because various slots can have the same problem, so we can
mention all such slots together.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.
Can you help me understand how the list of WAL records that pg_upgrade
can generate is put up? Identified them after running some tests?

Yeah, both by tests and manually verifying the WAL records. Basically,
we need to care about records that could be generated by background
processes like checkpointer/bgwriter or can be generated during system
table scans. You may want to read my latest email for a summary on how
we reached at this design choice [1]/messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com -- With Regards, Amit Kapila..

[1]: /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.

#267

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#265)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 22, 2023 at 11:59 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 21, 2023 at 6:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
1.
+    /*
+     * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+     * checkpointer process.  If WALs required by logical replication slots
+     * are removed, the slots are unusable.  This setting prevents the
+     * invalidation of slots during the upgrade. We set this option when
IIUC, during upgrade we don't want the checkpointer to remove WAL that
may be needed by logical slots, for that the patch overrides the user
set value for max_slot_wal_keep_size. What if the WAL is removed
because of the wal_keep_size setting?

We are fine with the WAL removal unless it can invalidate the slots
which is prevented by max_slot_wal_keep_size.

3. Does this patch support upgrading of logical replication slots on a
streaming standby?

No, and a note has been added by the patch for the same.

--
With Regards,
Amit Kapila.

#268

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#264)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 22, 2023 at 10:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for the patch. I have some comments on v42:

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

confirmed_flush_lsn for a logical slot can be null (for instance,
before confirmed_flush is updated for a newly created logical slot if
someone calls pg_stat_replication -> pg_get_replication_slots) and
when it is so, the binary_upgrade_create_empty_extension errors out.
Is this behaviour wanted? I think the function returning null on null
input is a better behaviour here.

I think if we do return null on null behavior then the caller needs to
add a special case for null value as this function returns bool. We
can probably return false in that case. Does that help to address your
concern?

--
With Regards,
Amit Kapila.

#269

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#260)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Again, thank you for reviewing! Here is a new version patch.

1.
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',

+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

Can proisstrict => 'f' be removed so that there's no need for explicit
PG_ARGISNULL check? Any specific reason to keep it?

Theoretically it could be, but I was not sure. I think you wanted us to follow
specs of pg_walinspect functions, but it is just a upgrade function. Normally
users cannot call it. Also, as Amit said [1]/messages/by-id/CAA4eK1LxPDeSkTttEAG2MPEWO=83vQe_Bja9F4QcCjVn=Wt9rA@mail.gmail.com, the caller must consider the
special case. Currently the function returns false at that time, we can change
more appropriate style later.

And, the before the ISNULL check the arg is read, which isn't good.

Right, fixed.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?

Per discussion [2]/messages/by-id/CAA4eK1L9oJmdxprFR3oob5KLpHUnkJAt5Le4woxO3wHz-SZ+TA@mail.gmail.com, changed to binary_upgrade_validate_wal_logical_end.

3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+
An LSN that doesn't exists yet is an error IMO, may be an error better here?

We think that the invalid slots should be listed at the end, so basically we do
not want to error out. This would be also changed if there are better opinions.

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *
This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

Hmm, I think it is better to do the reverse, because otherwise we need to mention
the same explanation at other caller of the function if any. So, I have
adjusted the comments atop and at caller. Thought?

8.
+    if (nslots_on_new)
+        pg_fatal("expected 0 logical replication slots but found %d",
+                 nslots_on_new);
How about "New cluster database is containing logical replication
slots", note that the some of the fatal messages are starting with an
upper-case letter.

I did not use your suggestion, but changed to upper-case.
Actually, the uppper-case rule is broken even in the file. Here I regarded
this sentence as hint message.

9.
+    res = executeQueryOrDie(conn, "SHOW wal_level;");
+    res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
Instead of 2 queries to determine required parameters, isn't it better
with a single query like the following?

select setting from pg_settings where name in ('wal_level',
'max_replication_slots') order by name;

Modified, but use ORDER BY ... DESC. This come from a previous comment [3]/messages/by-id/CAA4eK1LHH_=wbxsEn20=W+qz1193OqFj-vvJ-u0uHLMmwLHbRw@mail.gmail.com.

12.
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+
Why not these functions be defined in xlogreader.h with elog/ereport
in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right
location for these functions.

I checked comments atop both files, and xlogreader.h seems better. Fixed.

13.
+LogicalReplicationSlotInfo

Where is this structure defined?

Opps, removed.

[1]: /messages/by-id/CAA4eK1LxPDeSkTttEAG2MPEWO=83vQe_Bja9F4QcCjVn=Wt9rA@mail.gmail.com
[2]: /messages/by-id/CAA4eK1L9oJmdxprFR3oob5KLpHUnkJAt5Le4woxO3wHz-SZ+TA@mail.gmail.com
[3]: /messages/by-id/CAA4eK1LHH_=wbxsEn20=W+qz1193OqFj-vvJ-u0uHLMmwLHbRw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v44-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v44-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 7226ebcb7729ea2aa6e33430e82881a1ac5eea91 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v44] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++++-
 src/backend/access/transam/xlogreader.c       |  93 +++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    | 101 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 193 +++++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 ++-
 src/bin/pg_upgrade/info.c                     | 168 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 +++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 232 ++++++++++++++++++
 src/include/access/xlogreader.h               |   2 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1034 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index c9f9f6e98f..4a269374e5 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -29,6 +29,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
 #include "replication/origin.h"
@@ -2176,4 +2177,96 @@ XLogRecGetFullXid(XLogReaderState *record)
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
 
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
+
 #endif
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..cfb472ac0f 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,98 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Helper function for binary_upgrade_validate_wal_logical_end().
+ */
+static inline bool
+is_xlog_record_type(RmgrId rmgrid, uint8 info,
+					RmgrId expected_rmgrid, uint8 expected_info)
+{
+	return (rmgrid == expected_rmgrid) && (info == expected_info);
+}
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.
+ *
+ * Now, we have an exception here such that some of the WAL records could be
+ * generated during the upgrade process which actually doesn't need to be
+ * decoded. So, we need to ignore those.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn;
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	start_lsn = PG_GETARG_LSN(0);
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+				is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH);
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) ||
+			is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) ||
+			is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..95cb28164f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -30,6 +30,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -86,8 +88,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -104,6 +109,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -187,7 +199,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -210,6 +222,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -232,27 +246,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1402,3 +1395,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('max_replication_slots', 'wal_level') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing such slots or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of invalid logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..189a793106 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. See
+	 * binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..fb7ee26569 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..13bcc344fd
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,232 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');"
+);
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;"
+);
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,        '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $old_publisher->data_dir,
+		'-D',         $new_publisher->data_dir,
+		'-b',         $bindir,
+		'-B',         $bindir,
+		'-s',         $new_publisher->host,
+		'-p',         $old_publisher->port,
+		'-P',         $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index da32c7db77..ecec87d701 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -429,6 +429,8 @@ extern bool DecodeXLogRecord(XLogReaderState *state,
 
 #ifndef FRONTEND
 extern FullTransactionId XLogRecGetFullXid(XLogReaderState *record);
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 #endif
 
 extern bool RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..4b6ea2d185 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..5c9f8ae4d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#270

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#265)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

You mentioned at line 118, but at that time logical replication system is not

created.

The subscriber is created at line 163.
Therefore WALs would not be consumed automatically.

So, not calling pg_logical_slot_get_changes() on test_slot1 won't
consume the WAL?

Yes. This slot was created manually and no one activated it automatically.
pg_logical_slot_get_changes() can consume WALs but never called.

2.
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
How about a more descriptive and pointed name for the TAP test file,
something like 003_upgrade_logical_replication_slots.pl?

Good suggestion. Renamed.

3. Does this patch support upgrading of logical replication slots on a
streaming standby? If yes, isn't it a good idea to add one test for
upgrading standby with logical replication slots?

IIUC pg_upgrade would not be used for physical standby. The standby would be upgrade by:

* Recreating the database cluster, or
* Executing rsync command.

For more detail, please see the documentation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#271

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#266)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, both by tests and manually verifying the WAL records. Basically,
we need to care about records that could be generated by background
processes like checkpointer/bgwriter or can be generated during system
table scans. You may want to read my latest email for a summary on how
we reached at this design choice [1].

[1] - /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com

+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+    {

Why can't the patch allow migration of logical replication slots from
PG versions < 17 to say 17 or later? If done, it will be a main
advantage of the patch since it will enable seamless major version
upgrades of postgres database instances with logical replication
slots.

I'm looking at the changes to the postgres backend that this patch
does - AFICS, it does 2 things 1) implements
binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
that the logical slots won't get invalidated. For (1), pg_upgrade can
itself can read the WAL from the old cluster to determine the logical
WAL end (i.e. implement the functionality of
binary_upgrade_validate_wal_logical_end ) because the xlogreader is
available to FRONTEND tools. For (2), it's just an assertion and
logical WAL end determining logic will anyway determine whether or not
the slots are valid; if needed, the assertion can be backported.

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#272

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#271)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 11:15 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, both by tests and manually verifying the WAL records. Basically,
we need to care about records that could be generated by background
processes like checkpointer/bgwriter or can be generated during system
table scans. You may want to read my latest email for a summary on how
we reached at this design choice [1].

[1] - /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com
+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+    {
Why can't the patch allow migration of logical replication slots from
PG versions < 17 to say 17 or later? If done, it will be a main
advantage of the patch since it will enable seamless major version
upgrades of postgres database instances with logical replication
slots.

I'm looking at the changes to the postgres backend that this patch
does - AFICS, it does 2 things 1) implements
binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
that the logical slots won't get invalidated. For (1), pg_upgrade can
itself can read the WAL from the old cluster to determine the logical
WAL end (i.e. implement the functionality of
binary_upgrade_validate_wal_logical_end ) because the xlogreader is
available to FRONTEND tools. For (2), it's just an assertion and
logical WAL end determining logic will anyway determine whether or not
the slots are valid; if needed, the assertion can be backported.

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so. I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#273

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#272)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 12:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Sep 25, 2023 at 11:15 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, both by tests and manually verifying the WAL records. Basically,
we need to care about records that could be generated by background
processes like checkpointer/bgwriter or can be generated during system
table scans. You may want to read my latest email for a summary on how
we reached at this design choice [1].

[1] - /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com
+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+    {
Why can't the patch allow migration of logical replication slots from
PG versions < 17 to say 17 or later? If done, it will be a main
advantage of the patch since it will enable seamless major version
upgrades of postgres database instances with logical replication
slots.

I'm looking at the changes to the postgres backend that this patch
does - AFICS, it does 2 things 1) implements
binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
that the logical slots won't get invalidated. For (1), pg_upgrade can
itself can read the WAL from the old cluster to determine the logical
WAL end (i.e. implement the functionality of
binary_upgrade_validate_wal_logical_end ) because the xlogreader is
available to FRONTEND tools. For (2), it's just an assertion and
logical WAL end determining logic will anyway determine whether or not
the slots are valid; if needed, the assertion can be backported.

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?
IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so. I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

You might need to refer to [1]/messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com for the change I am talking about

[1]: /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#274

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#269)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Sep 23, 2023 at 10:18 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Again, thank you for reviewing! Here is a new version patch.

Here are some more comments/thoughts on the v44 patch:

1.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+    [

Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
file as well?

2.
+    'run of pg_upgrade where the new cluster has insufficient
max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

+    'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

+    'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

How do these tests recognize the failures are the intended ones? I
mean, for instance when pg_upgrade fails for unused replication
slots/unconsumed WAL records, then just looking at the presence of
pg_upgrade_output.d might not be sufficient, no? Using
command_fails_like instead of command_fails and looking at the
contents of invalid_logical_relication_slots.txt might help make these
tests more focused.

3.
+        pg_log(PG_REPORT, "fatal");
+        pg_fatal("Your installation contains invalid logical
replication slots.\n"
+                 "These slots can't be copied, so this cluster cannot
be upgraded.\n"
+                 "Consider removing such slots or consuming the
pending WAL if any,\n"
+                 "and then restart the upgrade.\n"
+                 "A list of invalid logical replication slots is in
the file:\n"
+                 "    %s", output_path);

It's not just the invalid logical replication slots, but also the
slots with unconsumed WALs which aren't invalid and can be upgraded if
ensured the WAL is consumed. So, a better wording would be:
pg_fatal("Your installation contains logical replication slots
that cannot be upgraded.\n"
"List of all such logical replication slots is in the file:\n"
"These slots can't be copied, so this cluster cannot
be upgraded.\n"
"Consider removing invalid slots and/or consuming the
pending WAL if any,\n"
"and then restart the upgrade.\n"
" %s", output_path);

4.
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
+        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_PARAMETER_CHANGE) ||
+            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) ||
+            is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);

What if we missed to capture the WAL records that may be generated
during upgrade?

What happens if a custom WAL resource manager generates table/index AM
WAL records during upgrade?

What happens if new WAL records are added that may be generated during
the upgrade? Isn't keeping this code extensible and in sync with
future changes a problem? Or we'd better say that any custom WAL
records are found after the slot's confirmed flush LSN, then the slot
isn't upgraded?

5. In continuation to the above comment:

Why can't this logic be something like - if there's any WAL record
seen after a slot's confirmed flush LSN is of type generated by WAL
resource manager having the rm_decode function defined, then the slot
can't be upgraded.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#275

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#273)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so. I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

You might need to refer to [1] for the change I am talking about

[1] /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

I see. IIUC, without that commit e0b2eed [1]commit e0b2eed047df9045664da6f724cb42c10f8b12f0 Author: Amit Kapila <akapila@postgresql.org> Date: Thu Sep 14 08:56:13 2023 +0530, it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no? If so, can't it be detected if the WAL
at confirmed_flush LSN is valid or not when reading WAL with
xlogreader machinery?

What if the commit e0b2eed [1]commit e0b2eed047df9045664da6f724cb42c10f8b12f0 Author: Amit Kapila <akapila@postgresql.org> Date: Thu Sep 14 08:56:13 2023 +0530 is treated to be fixing a bug with the
reasoning [2]It can also help avoid processing the same transactions again in some boundary cases after the clean shutdown and restart. Say, we process some transactions for which we didn't send anything downstream (the changes got filtered) but the confirm_flush LSN is updated due to keepalives. As we don't flush the latest value of confirm_flush LSN, it may lead to processing the same changes again without this patch. and backpatch? When done so, it's easy to support
upgradation/migration of logical replication slots from PG versions <
17, no?

[1]: commit e0b2eed047df9045664da6f724cb42c10f8b12f0 Author: Amit Kapila <akapila@postgresql.org> Date: Thu Sep 14 08:56:13 2023 +0530
commit e0b2eed047df9045664da6f724cb42c10f8b12f0
Author: Amit Kapila <akapila@postgresql.org>
Date: Thu Sep 14 08:56:13 2023 +0530

Flush logical slots to disk during a shutdown checkpoint if required.

[2]: It can also help avoid processing the same transactions again in some boundary cases after the clean shutdown and restart. Say, we process some transactions for which we didn't send anything downstream (the changes got filtered) but the confirm_flush LSN is updated due to keepalives. As we don't flush the latest value of confirm_flush LSN, it may lead to processing the same changes again without this patch.
It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#276

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#275)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so. I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

You might need to refer to [1] for the change I am talking about

[1] /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no? If so, can't it be detected if the WAL
at confirmed_flush LSN is valid or not when reading WAL with
xlogreader machinery?

Actually, without this commit the slot's "confirmed_flush LSN" value
in memory can be higher than the disk because if you notice this
function LogicalConfirmReceivedLocation(), if we change only the
confirmed flush the slot is not marked dirty that means on shutdown
the slot will not be persisted to the disk. But logically this will
not cause any issue so we can not treat it as a bug it may cause us to
process some extra records after the restart but that is not really a
bug.

What if the commit e0b2eed [1] is treated to be fixing a bug with the
reasoning [2] and backpatch? When done so, it's easy to support
upgradation/migration of logical replication slots from PG versions <
17, no?

Maybe this could be backpatched in order to support this upgrade from
the older version but not as a bug fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#277

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#275)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so. I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

You might need to refer to [1] for the change I am talking about

[1] /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no?

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

If so, can't it be detected if the WAL
at confirmed_flush LSN is valid or not when reading WAL with
xlogreader machinery?

What if the commit e0b2eed [1] is treated to be fixing a bug with the
reasoning [2] and backpatch? When done so, it's easy to support
upgradation/migration of logical replication slots from PG versions <
17, no?

Yeah, we could try to make a case to backpatch it but when I raised
that point there was not much consensus on backpatching it. We are
aware and understand that if we could backpatch it then the prior
version slots be upgraded but the case to backpatch needs broader
consensus. For now, the idea is to get the core of the functionality
to be committed and then we can see if we get the consensus on
backpatching the commit you mentioned and probably changing the
version checks in this work.

--
With Regards,
Amit Kapila.

#278

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#274)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thank you for giving comments! Before addressing your comments,
I wanted to reply some of them.

4.
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
+        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_FPI_FOR_HINT) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_PARAMETER_CHANGE) ||
+            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) ||
+            is_xlog_record_type(rmid, info, RM_HEAP2_ID,
XLOG_HEAP2_PRUNE);

What if we missed to capture the WAL records that may be generated
during upgrade?

If such records are generated before calling binary_upgrade_validate_wal_logical_end(),
the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care
such records because those aren't required to be replicated. The main thing we
want to detect is that we don't miss any record generated before server shutdown.

What happens if a custom WAL resource manager generates table/index AM
WAL records during upgrade?

If such records are found, definitely we cannot distinguish whether it is acceptable.
We do not have a way to know the property of custom WALs. We didn't care as there
are other problems in the approach, if such a facility is invoked.
Please see the similar discussion [1]/messages/by-id/ZNZ4AxUMIrnMgRbo@momjian.us.

What happens if new WAL records are added that may be generated during
the upgrade? Isn't keeping this code extensible and in sync with
future changes a problem?

Actually, others also pointed out the similar point. Originally we just checked
confirmed_flush_lsn and "latest checkpoint lsn" reported by pg_controldata, but
found an issue what the upgrading cannot be passed if users do pg_upgrade --check
just before the actual upgrade. Then we discussed some idea but they have some
disadvantages, so we settled on the current idea. Here is a summary which
describes current situation it would be quite helpful [2]/messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com
(maybe you have already known).

Or we'd better say that any custom WAL
records are found after the slot's confirmed flush LSN, then the slot
isn't upgraded?

After concluding how we ensure, we can add the sentence accordingly.

5. In continuation to the above comment:

Why can't this logic be something like - if there's any WAL record
seen after a slot's confirmed flush LSN is of type generated by WAL
resource manager having the rm_decode function defined, then the slot
can't be upgraded.

Thank you for giving new approach! We have never seen the approach before,
but at least XLOG and HEAP2 rmgr have a decode function. So that
XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and XLOG_HEAP2_PRUNE cannot
be ignored the approach, seems not appropriate.
If you have another approach, I'm very happy if you post.

[1]: /messages/by-id/ZNZ4AxUMIrnMgRbo@momjian.us
[2]: /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#279

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#278)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Monday, September 25, 2023 7:01 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

To: 'Bharath Rupireddy' <bharath.rupireddyforpostgres@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar

5. In continuation to the above comment:

Why can't this logic be something like - if there's any WAL record
seen after a slot's confirmed flush LSN is of type generated by WAL
resource manager having the rm_decode function defined, then the slot
can't be upgraded.

Thank you for giving new approach! We have never seen the approach before,
but at least XLOG and HEAP2 rmgr have a decode function. So that
XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and
XLOG_HEAP2_PRUNE cannot be ignored the approach, seems not appropriate.
If you have another approach, I'm very happy if you post.

Another idea around decoding is to check if there is any decoding output for
the WAL records.

Like we can create a temp slot and use test_decoding to decode the WAL from the
confirmed_flush_lsn among existing logical replication slots. And if there is
any output from the output plugin, then we consider WAL has not been consumed
yet.

But this means we need to ignore some of the WALs like XLOG_XACT_INVALIDATIONS
which won't be decoded into the output. Also, this approach could be costly as
it needs to do the extra decoding and output, and we need to assume that "all the
WAL records including custom records will be decoded and output if they need to
be consumed" .

So it may not be better, but just share it for reference.

Best Regards,
Hou zj

#280

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#274)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Again, thank you for reviewing! PSA a new version.

Here are some more comments/thoughts on the v44 patch:
1.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+    [
Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
file as well?

Added. The test was not added because 002_pg_upgrade.pl did not do similar checks,
but it is worth verifying. One difficulty was that output directory had millisecond
timestamp, so the absolute path could not be predicted. So File::Find::find was
used to detect the file.

2.
+    'run of pg_upgrade where the new cluster has insufficient
max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
+    'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
+    'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
How do these tests recognize the failures are the intended ones? I
mean, for instance when pg_upgrade fails for unused replication
slots/unconsumed WAL records, then just looking at the presence of
pg_upgrade_output.d might not be sufficient, no? Using
command_fails_like instead of command_fails and looking at the
contents of invalid_logical_relication_slots.txt might help make these
tests more focused.

Yeah, currently the output was not checked. I checked and found that pg_upgrade
would output all messages (including error message) to stdout, so
command_fails_like() could not be used. Therefore, command_checks_all() was used
instead.

3.
+        pg_log(PG_REPORT, "fatal");
+        pg_fatal("Your installation contains invalid logical
replication slots.\n"
+                 "These slots can't be copied, so this cluster cannot
be upgraded.\n"
+                 "Consider removing such slots or consuming the
pending WAL if any,\n"
+                 "and then restart the upgrade.\n"
+                 "A list of invalid logical replication slots is in
the file:\n"
+                 "    %s", output_path);
It's not just the invalid logical replication slots, but also the
slots with unconsumed WALs which aren't invalid and can be upgraded if
ensured the WAL is consumed. So, a better wording would be:
pg_fatal("Your installation contains logical replication slots
that cannot be upgraded.\n"
"List of all such logical replication slots is in the file:\n"
"These slots can't be copied, so this cluster cannot
be upgraded.\n"
"Consider removing invalid slots and/or consuming the
pending WAL if any,\n"
"and then restart the upgrade.\n"
" %s", output_path);

Fixed.

Also, I ran pgperltidy. Some formattings were changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From b0f958c1fb13945b2b006c1f14b6bd557b04271d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v45] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 ++++-
 src/backend/access/transam/xlogreader.c       |  93 ++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    | 100 +++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 193 +++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 +-
 src/bin/pg_upgrade/info.c                     | 168 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 266 ++++++++++++++++++
 src/include/access/xlogreader.h               |   2 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1067 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a17263df20..566e18a248 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -29,6 +29,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
 #include "replication/origin.h"
@@ -2198,4 +2199,96 @@ XLogRecGetFullXid(XLogReaderState *record)
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
 
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
+
 #endif
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..bd64da7205 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,97 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Helper function for binary_upgrade_validate_wal_logical_end().
+ */
+static inline bool
+is_xlog_record_type(RmgrId rmgrid, uint8 info,
+					RmgrId expected_rmgrid, uint8 expected_info)
+{
+	return (rmgrid == expected_rmgrid) && (info == expected_info);
+}
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.
+ *
+ * During the upgrade process there can be certain types of WAL records
+ * generated that don't need to be decoded. Such records are ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn;
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	start_lsn = PG_GETARG_LSN(0);
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+				is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH);
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) ||
+			is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) ||
+			is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..731b987d33 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -245,27 +259,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1451,3 +1444,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('max_replication_slots', 'wal_level') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..189a793106 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. See
+	 * binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..fb7ee26569 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..270044d75e
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,266 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index da32c7db77..ecec87d701 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -429,6 +429,8 @@ extern bool DecodeXLogRecord(XLogReaderState *state,
 
 #ifndef FRONTEND
 extern FullTransactionId XLogRecGetFullXid(XLogReaderState *record);
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 #endif
 
 extern bool RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..d00d70f2ef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..5c9f8ae4d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#281

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#280)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Sep 26, 2023 at 10:51 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Again, thank you for reviewing! PSA a new version.

Thanks for the new patch. Here's a comment on v46:

1.
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },

I think this patch can avoid catalog changes by turning
binary_upgrade_validate_wal_logical_end a FRONTEND-only function
sitting in xlogreader.c after making InitXLogReaderState(),
ReadNextXLogRecord() FRONTEND-friendly (replace elog/ereport with
pg_fatal or such). With this change and back-porting of commit
e0b2eed0 to save logical slots at shutdown, the patch can help support
upgrading logical replication slots on PG versions < 17.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#282

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#281)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thank you for reviewing!

Thanks for the new patch. Here's a comment on v46:
1.
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
I think this patch can avoid catalog changes by turning
binary_upgrade_validate_wal_logical_end a FRONTEND-only function
sitting in xlogreader.c after making InitXLogReaderState(),
ReadNextXLogRecord() FRONTEND-friendly (replace elog/ereport with
pg_fatal or such). With this change and back-porting of commit
e0b2eed0 to save logical slots at shutdown, the patch can help support
upgrading logical replication slots on PG versions < 17.

Hmm, I think your suggestion may be questionable.

If we implement the upgrading function as FRONTEND-only (I have not checked its
feasibility), it means pg_upgrade uses the latest version WAL reader API to read
WALs in old version cluster, which I didn't think is suggested.

Each WAL page header has a magic number, XLOG_PAGE_MAGIC, which indicates the
content of WAL. Sometimes the value has been changed due to the changes of WAL
contents, and some functions requires that the magic number must be same as
expected. E.g., startup process and pg_walinspect functions require that.
Typically XLogReaderValidatePageHeader() ensures the equality.

Now some functions are ported from pg_walinspect, so upgrading function requires
same restriction. I think we should not ease the restriction to verify the
completeness of files. Followings are the call stack of ported functions
till XLogReaderValidatePageHeader().

```
InitXLogReaderState()
XLogFindNextRecord()
ReadPageInternal()
XLogReaderValidatePageHeader()
```

```
ReadNextXLogRecord()
XLogReadRecord()
XLogReadAhead()
XLogDecodeNextRecord()
ReadPageInternal()
XLogReaderValidatePageHeader()
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#283

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#277)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

[1] /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no?

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

Thanks for clarifying. I'm trying understand what happens without
commit e0b2eed0 with an illustration:

step 1: publisher - confirmed_flush LSN in replication slot on disk
structure is 80
step 2: publisher - sends WAL at LSN 100
step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
step 4: publisher - shuts down without writing the new confirmed_flush
LSN as 100 to disk, note that commit e0b2eed0 is not in place
step 5: publisher - restarts
step 6: subscriber - upon publisher restart, the subscriber requests
WAL from publisher from LSN 100 as it tracks the last applied LSN in
replication origin

Now, if the pg_upgrade with the patch in this thread is run on
publisher after step 4, it complains with "The slot \"%s\" has not
consumed the WAL yet".

Is my above understanding right?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#284

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#283)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Sep 25, 2023 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

[1] /messages/by-id/CAA4eK1+LtWDKXvxS7gnJ562VX+s3C6+0uQWamqu=UuD8hMfORg@mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no?

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

Thanks for clarifying. I'm trying understand what happens without
commit e0b2eed0 with an illustration:

step 1: publisher - confirmed_flush LSN in replication slot on disk
structure is 80
step 2: publisher - sends WAL at LSN 100
step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
step 4: publisher - shuts down without writing the new confirmed_flush
LSN as 100 to disk, note that commit e0b2eed0 is not in place
step 5: publisher - restarts
step 6: subscriber - upon publisher restart, the subscriber requests
WAL from publisher from LSN 100 as it tracks the last applied LSN in
replication origin

Now, if the pg_upgrade with the patch in this thread is run on
publisher after step 4, it complains with "The slot \"%s\" has not
consumed the WAL yet".

Is my above understanding right?

Yes.

--
With Regards,
Amit Kapila.

#285

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#284)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 28, 2023 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

Thanks for clarifying. I'm trying understand what happens without
commit e0b2eed0 with an illustration:

step 1: publisher - confirmed_flush LSN in replication slot on disk
structure is 80
step 2: publisher - sends WAL at LSN 100
step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
step 4: publisher - shuts down without writing the new confirmed_flush
LSN as 100 to disk, note that commit e0b2eed0 is not in place
step 5: publisher - restarts
step 6: subscriber - upon publisher restart, the subscriber requests
WAL from publisher from LSN 100 as it tracks the last applied LSN in
replication origin

Now, if the pg_upgrade with the patch in this thread is run on
publisher after step 4, it complains with "The slot \"%s\" has not
consumed the WAL yet".

Is my above understanding right?

Yes.

Thanks. Trying things with replication lag - when there's a lag, the
pg_upgrade can't proceed further and it complains "The slot "mysub"
has not consumed the WAL yet".

I think the best way to upgrade a postgres instance with logical
replication slots is: 1) ensure no replication lag for the logical
slots; 2) perform pg_upgrade --check first; 3) perform pg_upgrade if
there are no complaints.

With the above understanding, it looks to me that the commit e0b2eed0
isn't necessary for back branches. Because, without it the pg_upgrade
complains "The slot "mysub" has not consumed the WAL yet", and then
the user has to restart the instance to ensure the WAL is consumed
(IOW, to get the correct confirmed_flush LSN to the disk).

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#286

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Michael Paquier (#263)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 22, 2023 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

Okay. I am wondering if the subscriber part is OK now without an
option, but that could also be considered separately, as well. At
least I hope so.

+1 for an option to skip upgrade logical replication slots for the
following reasons:
- one may not want the logical replication slots on the upgraded
instance immediately - unless the upgraded instance is tested and
determined to be performant.
- one may not want the logical replication slots on the upgraded
instance immediately - no logical replication setup is wanted on the
new instance perhaps because of an architectural/organizational
decision.
- one may take backup of the postgres instance with logical
replication slots using any of the file system/snapshot based backup
mechanisms (not pg_basebackup), essentially getting the on-disk
replication slots data as well; the pg_upgrade may fail on the
backed-up instance.

I agree to have it as a 0002 patch once the design and things are
finalized for the main patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#287

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#285)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 28, 2023 at 1:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 28, 2023 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

Thanks for clarifying. I'm trying understand what happens without
commit e0b2eed0 with an illustration:

step 1: publisher - confirmed_flush LSN in replication slot on disk
structure is 80
step 2: publisher - sends WAL at LSN 100
step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
step 4: publisher - shuts down without writing the new confirmed_flush
LSN as 100 to disk, note that commit e0b2eed0 is not in place
step 5: publisher - restarts
step 6: subscriber - upon publisher restart, the subscriber requests
WAL from publisher from LSN 100 as it tracks the last applied LSN in
replication origin

Now, if the pg_upgrade with the patch in this thread is run on
publisher after step 4, it complains with "The slot \"%s\" has not
consumed the WAL yet".

Is my above understanding right?

Yes.

Thanks. Trying things with replication lag - when there's a lag, the
pg_upgrade can't proceed further and it complains "The slot "mysub"
has not consumed the WAL yet".

I think the best way to upgrade a postgres instance with logical
replication slots is: 1) ensure no replication lag for the logical
slots; 2) perform pg_upgrade --check first; 3) perform pg_upgrade if
there are no complaints.

With the above understanding, it looks to me that the commit e0b2eed0
isn't necessary for back branches. Because, without it the pg_upgrade
complains "The slot "mysub" has not consumed the WAL yet", and then
the user has to restart the instance to ensure the WAL is consumed
(IOW, to get the correct confirmed_flush LSN to the disk).

The point is it will be difficult for users to ensure that all the WAL
is consumed because it may have already been sent even after restart
and shutdown but the check will still fail. I think the argument to
support upgrade from branches where we don't have commit e0b2eed0 has
some merits and we can change the checks if there is broader agreement
on it. Let's try to agree on whether the core patch is good as is
especially what we want to achieve via validate_wal_records. Once we
agree on the main patch and commit it, the other work including
considering having an option to upgrade slots can be done as top-up
patches.

--
With Regards,
Amit Kapila.

#288

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#286)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 28, 2023 at 2:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 22, 2023 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

Okay. I am wondering if the subscriber part is OK now without an
option, but that could also be considered separately, as well. At
least I hope so.
+1 for an option to skip upgrade logical replication slots for the
following reasons:
- one may not want the logical replication slots on the upgraded
instance immediately - unless the upgraded instance is tested and
determined to be performant.
- one may not want the logical replication slots on the upgraded
instance immediately - no logical replication setup is wanted on the
new instance perhaps because of an architectural/organizational
decision.
- one may take backup of the postgres instance with logical
replication slots using any of the file system/snapshot based backup
mechanisms (not pg_basebackup), essentially getting the on-disk
replication slots data as well; the pg_upgrade may fail on the
backed-up instance.
I agree to have it as a 0002 patch once the design and things are
finalized for the main patch.

Thanks for understanding that it can be done as a 0002 patch because
we don't have an agreement on this. Jonathan feels exactly the
opposite for having an option that by default doesn't migrate slots as
users always need to use the option and they may want to have slots
migrated by default. So, we may consider to have an --exclude-*
option.

--
With Regards,
Amit Kapila.

#289

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#278)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Sep 25, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

4.
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
+        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_FPI_FOR_HINT) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_PARAMETER_CHANGE) ||
+            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) ||
+            is_xlog_record_type(rmid, info, RM_HEAP2_ID,
XLOG_HEAP2_PRUNE);
What if we missed to capture the WAL records that may be generated
during upgrade?
If such records are generated before calling binary_upgrade_validate_wal_logical_end(),
the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care
such records because those aren't required to be replicated. The main thing we
want to detect is that we don't miss any record generated before server shutdown.

I read this /messages/by-id/20230725170319.h423jbthfohwgnf7@awork3.anarazel.de
and understand that the current patch implements the approach
suggested there - "scan the end of the WAL for records that should
have been streamed out". I think the WAL records that should have been
streamed out are all WAL record types in XXXX_decode functions except
the ones that have a no-op or an op unrelated to logical decoding. For
instance,
- for xlog_decode, if the records of type {XLOG_CHECKPOINT_ONLINE,
XLOG_PARAMETER_CHANGE, XLOG_NOOP, XLOG_NEXTOID, XLOG_SWITCH,
XLOG_BACKUP_END, XLOG_RESTORE_POINT, XLOG_FPW_CHANGE,
XLOG_FPI_FOR_HINT, XLOG_FPI, XLOG_OVERWRITE_CONTRECORD} are found
after confirmed_flush LSN, it is fine.
- for xact_decode, if the records of type {XLOG_XACT_ASSIGNMENT} are
found after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for heap2_decode, if the records of type {XLOG_HEAP2_REWRITE,
XLOG_HEAP2_FREEZE_PAGE, XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_VISIBLE, XLOG_HEAP2_LOCK_UPDATED} are found after
confirmed_flush LSN, it is fine.
- for heap_decode, if the records of type {XLOG_HEAP_LOCK} are found
after confirmed_flush LSN, it is fine.

I think all of the above WAL records are okay to be present after
cofirmed_flush LSN. If any WAL records other than the above are found
after confirmed_flush LSN, those are the one that should have been
streamed out and the pg_upgrade must complain with "The slot "foo" has
not consumed the WAL yet" for all such slots, right? But, the function
binary_upgrade_validate_wal_logical_end checks for only a handful of
the above record types. I know that the list is arrived at based on
testing, but it may happen that any of the above WAL records may be
generated and present before/during/after pg_upgrade for which
pg_upgrade failure isn't wanted.

Perhaps, a function in logical/decode.c returning the WAL record as
valid if the record type is any of the above. A note in
replication/decode.h and/or access/rmgrlist.h asking rmgr adders to
categorize the WAL record type in the new function based on its
decoding operation might help with future new WAL record type
additions.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#290

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#289)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

On Mon, Sep 25, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

4.
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
+        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||

...

What if we missed to capture the WAL records that may be generated
during upgrade?

If such records are generated before calling
binary_upgrade_validate_wal_logical_end(),
the upgrading would fail. Otherwise it would be succeeded. Anyway, we
don't care such records because those aren't required to be
replicated. The main thing we want to detect is that we don't miss any record

generated before server shutdown.

I read this
/messages/by-id/20230725170319.h423jbthfohwgnf7@a
work3.anarazel.de
and understand that the current patch implements the approach suggested
there - "scan the end of the WAL for records that should have been streamed
out". I think the WAL records that should have been streamed out are all WAL
record types in XXXX_decode functions except the ones that have a no-op or an
op unrelated to logical decoding. For instance,
- for xlog_decode, if the records of type {XLOG_CHECKPOINT_ONLINE,
XLOG_PARAMETER_CHANGE, XLOG_NOOP, XLOG_NEXTOID, XLOG_SWITCH,
XLOG_BACKUP_END, XLOG_RESTORE_POINT, XLOG_FPW_CHANGE,
XLOG_FPI_FOR_HINT, XLOG_FPI, XLOG_OVERWRITE_CONTRECORD} are found
after confirmed_flush LSN, it is fine.
- for xact_decode, if the records of type {XLOG_XACT_ASSIGNMENT} are found
after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for heap2_decode, if the records of type {XLOG_HEAP2_REWRITE,
XLOG_HEAP2_FREEZE_PAGE, XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_VISIBLE, XLOG_HEAP2_LOCK_UPDATED} are found after
confirmed_flush LSN, it is fine.
- for heap_decode, if the records of type {XLOG_HEAP_LOCK} are found after
confirmed_flush LSN, it is fine.

I think all of the above WAL records are okay to be present after cofirmed_flush
LSN. If any WAL records other than the above are found after confirmed_flush
LSN, those are the one that should have been streamed out and the pg_upgrade
must complain with "The slot "foo" has not consumed the WAL yet" for all such
slots, right? But, the function binary_upgrade_validate_wal_logical_end checks
for only a handful of the above record types. I know that the list is arrived at
based on testing, but it may happen that any of the above WAL records may be
generated and present before/during/after pg_upgrade for which pg_upgrade
failure isn't wanted.

Perhaps, a function in logical/decode.c returning the WAL record as valid if the
record type is any of the above. A note in replication/decode.h and/or
access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
new function based on its decoding operation might help with future new WAL
record type additions.

Thoughts?

I think this approach can work, but I am not sure if it's better than other
approaches. Mainly because it has almost the same maintaince burden as the
current approach, i.e. we need to verify and update the check function each
time we add a new WAL record type.

Apart from the WAL scan approach, we also considered alternative approach that
do not impose an additional maintenance burden and could potentially be less
complex. For example, we can add a new field in pg_controldata to record the
last checkpoint that happens in non-upgrade mode, so that we can compare the
slot's confirmed_flush_lsn with this value, If they are the same, the WAL
should have been consumed otherwise we disallow upgrading this slot. I would
appreciate if you can share your thought about this approach.

And if we decided to use WAL scan approach, instead of checking each record, we
could directly check if the WAL record can be decoded into meaningful results
by use test_decoding to decode them. This approach also doesn't add new
maintenance burden as we anyway need to update the test_decoding if any decode
logic for new record changes. This was also mentioned [1]/messages/by-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A@OS0PR01MB5716.jpnprd01.prod.outlook.com.

What do you think ?

[1]: /messages/by-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

#291

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#290)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Sep 28, 2023 at 6:08 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

Perhaps, a function in logical/decode.c returning the WAL record as valid if the
record type is any of the above. A note in replication/decode.h and/or
access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
new function based on its decoding operation might help with future new WAL
record type additions.

Thoughts?

I think this approach can work, but I am not sure if it's better than other
approaches. Mainly because it has almost the same maintaince burden as the
current approach, i.e. we need to verify and update the check function each
time we add a new WAL record type.

I think that's not a big problem if we have comments in
replication/decode.h, access/rmgrlist.h, docs to categorize the new
WAL records as decodable. Currently, the WAL record types adders will
have to do certain things based on notes in comments or docs anyways.

Another idea to enforce categorizing decodability of WAL records is to
have a new RMGR API rm_is_record_decodable or such, the RMGR
implementers will then add respective functions returning true/false
if a given WAL record is decodable or not:
void (*rm_decode) (struct LogicalDecodingContext *ctx,
struct XLogRecordBuffer *buf);
bool (*rm_is_record_decodable) (uint8 type);
} RmgrData;

PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL,
NULL, NULL, xlog_is_record_decodable), then the
xlog_is_record_decodable can look something like [1]bool xlog_is_record_decodable(uint8 type) { switch (info) { case XLOG_CHECKPOINT_SHUTDOWN: case XLOG_END_OF_RECOVERY: return true; case XLOG_CHECKPOINT_ONLINE: case XLOG_PARAMETER_CHANGE: case XLOG_NOOP: case XLOG_NEXTOID: case XLOG_SWITCH: case XLOG_BACKUP_END: case XLOG_RESTORE_POINT: case XLOG_FPW_CHANGE: case XLOG_FPI_FOR_HINT: case XLOG_FPI: case XLOG_OVERWRITE_CONTRECORD: return false; default: elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info); } }.

This approach can also enforce/help custom RMGR implementers to define
the decodability of the WAL records.

Apart from the WAL scan approach, we also considered alternative approach that
do not impose an additional maintenance burden and could potentially be less
complex. For example, we can add a new field in pg_controldata to record the
last checkpoint that happens in non-upgrade mode, so that we can compare the
slot's confirmed_flush_lsn with this value, If they are the same, the WAL
should have been consumed otherwise we disallow upgrading this slot. I would
appreciate if you can share your thought about this approach.

I read this /messages/by-id/CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com
and I agree with the concern on adding a new filed in pg_controldata
just for this purpose and spreading the IsBinaryUpgrade code in
checkpointer. Another concern for me with a new filed in
pg_controldata approach is that it makes it hard to make this patch
support back branches. Therefore, -1 for this approach from me.

And if we decided to use WAL scan approach, instead of checking each record, we
could directly check if the WAL record can be decoded into meaningful results
by use test_decoding to decode them. This approach also doesn't add new
maintenance burden as we anyway need to update the test_decoding if any decode
logic for new record changes. This was also mentioned [1].

What do you think ?

[1] /messages/by-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A@OS0PR01MB5716.jpnprd01.prod.outlook.com

-1 for decoding the WAL with test_decoding, I don't think it's a great
idea to create temp slots and launch walsenders during upgrade.

IMO, WAL scanning approach looks better. However, if were to optimize
it by not scanning WAL records for every replication slot
confirmed_flush_lsn (CFL), start with lowest CFL (min of all slots
CFL), and scan till the end of WAL. The
binary_upgrade_validate_wal_logical_end function can return an array
of LSNs at which decodable WAL records are found. Then, use CFLs of
all other slots and this array to determine if the slots have
unconsumed WAL. Following is an illustration of this idea:

1. Slots s1, s2, s3, s4, s5 with CFLs 100, 90, 110, 70, 80 respectively.
2. Min of all CFLs is 70 for slot s4.
3. Start scanning WAL from min CFL 70 for slot s4, say there are
unconsumed WAL at LSN {85, 89}.
4. Now, without scanning WAL for rest of the slots, determine if they
have unconsumed WAL.
5.1. CFL of slot s1 is 100 and no unconsumed WAL at or after LSN 100 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.2. CFL of slot s2 is 90 and no unconsumed WAL at or after LSN 90 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.3. CFL of slot s3 is 110 and no unconsumed WAL at or after LSN 110 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.4. CFL of slot s4 is 70 and there's unconsumed WAL at or after LSN
70 - look at the array of unconsumed WAL LSNs {85, 89}.
5.5. CFL of slot s5 is 80 and there's unconsumed WAL at or after LSN
80 - look at the array of unconsumed WAL LSNs {85, 89}.

With this approach, the WAL is scanned only once as opposed to the
current approach the patch implements.

Thoughts?

[1]: bool xlog_is_record_decodable(uint8 type) { switch (info) { case XLOG_CHECKPOINT_SHUTDOWN: case XLOG_END_OF_RECOVERY: return true; case XLOG_CHECKPOINT_ONLINE: case XLOG_PARAMETER_CHANGE: case XLOG_NOOP: case XLOG_NEXTOID: case XLOG_SWITCH: case XLOG_BACKUP_END: case XLOG_RESTORE_POINT: case XLOG_FPW_CHANGE: case XLOG_FPI_FOR_HINT: case XLOG_FPI: case XLOG_OVERWRITE_CONTRECORD: return false; default: elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info); } }
bool
xlog_is_record_decodable(uint8 type)
{
switch (info)
{
case XLOG_CHECKPOINT_SHUTDOWN:
case XLOG_END_OF_RECOVERY:
return true;
case XLOG_CHECKPOINT_ONLINE:
case XLOG_PARAMETER_CHANGE:
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
case XLOG_FPI:
case XLOG_OVERWRITE_CONTRECORD:
return false;
default:
elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
}
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#292

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#291)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 29, 2023 at 1:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 28, 2023 at 6:08 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

IMO, WAL scanning approach looks better. However, if were to optimize
it by not scanning WAL records for every replication slot
confirmed_flush_lsn (CFL), start with lowest CFL (min of all slots
CFL), and scan till the end of WAL.

Earlier, I also thought something like that but I guess it won't
matter much as most of the slots will be up-to-date at shutdown time.
That would mean we would read just one or two records. Personally, I
feel it is better to build consensus on the WAL scanning approach,
basically, is it okay to decide as the patch is currently doing or
whether we should expose an API from the decode module as you are
proposing? OTOH, if we want to go with other approach like adding
field in pg_controldata then we don't need to deal with WAL record
types at all.

--
With Regards,
Amit Kapila.

#293

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#291)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thanks for giving your idea!

I think this approach can work, but I am not sure if it's better than other
approaches. Mainly because it has almost the same maintaince burden as the
current approach, i.e. we need to verify and update the check function each
time we add a new WAL record type.

I think that's not a big problem if we have comments in
replication/decode.h, access/rmgrlist.h, docs to categorize the new
WAL records as decodable. Currently, the WAL record types adders will
have to do certain things based on notes in comments or docs anyways.

Another idea to enforce categorizing decodability of WAL records is to
have a new RMGR API rm_is_record_decodable or such, the RMGR
implementers will then add respective functions returning true/false
if a given WAL record is decodable or not:
void (*rm_decode) (struct LogicalDecodingContext *ctx,
struct XLogRecordBuffer *buf);
bool (*rm_is_record_decodable) (uint8 type);
} RmgrData;

PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL,
NULL, NULL, xlog_is_record_decodable), then the
xlog_is_record_decodable can look something like [1].

This approach can also enforce/help custom RMGR implementers to define
the decodability of the WAL records.

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...
Also, this approach cannot be backported.

Anyway, let's see how senior members say.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#294

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#293)

2 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

Also, this approach cannot be backported.

Neither the current patch as-is. I'm not looking at backporting this
feature right now, but making it as robust and extensible as possible
for PG17.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From b0f958c1fb13945b2b006c1f14b6bd557b04271d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v45] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 contrib/pg_walinspect/pg_walinspect.c         |  94 -------
 doc/src/sgml/ref/pgupgrade.sgml               |  76 ++++-
 src/backend/access/transam/xlogreader.c       |  93 ++++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    | 100 +++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 193 +++++++++++--
 src/bin/pg_upgrade/function.c                 |  31 +-
 src/bin/pg_upgrade/info.c                     | 168 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 107 ++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  24 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 266 ++++++++++++++++++
 src/include/access/xlogreader.h               |   2 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1067 insertions(+), 135 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 796a74f322..49f4f92e98 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -40,8 +40,6 @@ PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
 
 static void ValidateInputLSNs(XLogRecPtr start_lsn, XLogRecPtr *end_lsn);
 static XLogRecPtr GetCurrentLSN(void);
-static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
-static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 static void GetWALRecordInfo(XLogReaderState *record, Datum *values,
 							 bool *nulls, uint32 ncols);
 static void GetWALRecordsInfo(FunctionCallInfo fcinfo,
@@ -84,98 +82,6 @@ GetCurrentLSN(void)
 	return curr_lsn;
 }
 
-/*
- * Initialize WAL reader and identify first valid LSN.
- */
-static XLogReaderState *
-InitXLogReaderState(XLogRecPtr lsn)
-{
-	XLogReaderState *xlogreader;
-	ReadLocalXLogPageNoWaitPrivate *private_data;
-	XLogRecPtr	first_valid_record;
-
-	/*
-	 * Reading WAL below the first page of the first segments isn't allowed.
-	 * This is a bootstrap WAL page and the page_read callback fails to read
-	 * it.
-	 */
-	if (lsn < XLOG_BLCKSZ)
-		ereport(ERROR,
-				(errmsg("could not read WAL at LSN %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	private_data = (ReadLocalXLogPageNoWaitPrivate *)
-		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
-
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									private_data);
-
-	if (xlogreader == NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory"),
-				 errdetail("Failed while allocating a WAL reading processor.")));
-
-	/* first find a valid recptr to start from */
-	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
-
-	if (XLogRecPtrIsInvalid(first_valid_record))
-		ereport(ERROR,
-				(errmsg("could not find a valid record after %X/%X",
-						LSN_FORMAT_ARGS(lsn))));
-
-	return xlogreader;
-}
-
-/*
- * Read next WAL record.
- *
- * By design, to be less intrusive in a running system, no slot is allocated
- * to reserve the WAL we're about to read. Therefore this function can
- * encounter read errors for historical WAL.
- *
- * We guard against ordinary errors trying to read WAL that hasn't been
- * written yet by limiting end_lsn to the flushed WAL, but that can also
- * encounter errors if the flush pointer falls in the middle of a record. In
- * that case we'll return NULL.
- */
-static XLogRecord *
-ReadNextXLogRecord(XLogReaderState *xlogreader)
-{
-	XLogRecord *record;
-	char	   *errormsg;
-
-	record = XLogReadRecord(xlogreader, &errormsg);
-
-	if (record == NULL)
-	{
-		ReadLocalXLogPageNoWaitPrivate *private_data;
-
-		/* return NULL, if end of WAL is reached */
-		private_data = (ReadLocalXLogPageNoWaitPrivate *)
-			xlogreader->private_data;
-
-		if (private_data->end_of_wal)
-			return NULL;
-
-		if (errormsg)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X: %s",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read WAL at %X/%X",
-							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
-	}
-
-	return record;
-}
-
 /*
  * Output values that make up a row describing caller's WAL record.
  *
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index bea0d1b93f..1a17572d14 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -652,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a17263df20..566e18a248 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -29,6 +29,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
 #include "replication/origin.h"
@@ -2198,4 +2199,96 @@ XLogRecGetFullXid(XLogReaderState *record)
 	return FullTransactionIdFromEpochAndXid(epoch, xid);
 }
 
+/*
+ * Initialize WAL reader and identify first valid LSN.
+ */
+XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	ReadLocalXLogPageNoWaitPrivate *private_data;
+	XLogRecPtr	first_valid_record;
+
+	/*
+	 * Reading WAL below the first page of the first segments isn't allowed.
+	 * This is a bootstrap WAL page and the page_read callback fails to read
+	 * it.
+	 */
+	if (lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errmsg("could not read WAL at LSN %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	private_data = (ReadLocalXLogPageNoWaitPrivate *)
+		palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page_no_wait,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									private_data);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	/* first find a valid recptr to start from */
+	first_valid_record = XLogFindNextRecord(xlogreader, lsn);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+		ereport(ERROR,
+				(errmsg("could not find a valid record after %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ *
+ * By design, to be less intrusive in a running system, no slot is allocated
+ * to reserve the WAL we're about to read. Therefore this function can
+ * encounter read errors for historical WAL.
+ *
+ * We guard against ordinary errors trying to read WAL that hasn't been
+ * written yet by limiting end_lsn to the flushed WAL, but that can also
+ * encounter errors if the flush pointer falls in the middle of a record. In
+ * that case we'll return NULL.
+ */
+XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader)
+{
+	XLogRecord *record;
+	char	   *errormsg;
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	if (record == NULL)
+	{
+		ReadLocalXLogPageNoWaitPrivate *private_data;
+
+		/* return NULL, if end of WAL is reached */
+		private_data = (ReadLocalXLogPageNoWaitPrivate *)
+			xlogreader->private_data;
+
+		if (private_data->end_of_wal)
+			return NULL;
+
+		if (errormsg)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X: %s",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr), errormsg)));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read WAL at %X/%X",
+							LSN_FORMAT_ARGS(xlogreader->EndRecPtr))));
+	}
+
+	return record;
+}
+
 #endif
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 3ded3c1473..a91d412f91 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..bd64da7205 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,97 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Helper function for binary_upgrade_validate_wal_logical_end().
+ */
+static inline bool
+is_xlog_record_type(RmgrId rmgrid, uint8 info,
+					RmgrId expected_rmgrid, uint8 expected_info)
+{
+	return (rmgrid == expected_rmgrid) && (info == expected_info);
+}
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.
+ *
+ * During the upgrade process there can be certain types of WAL records
+ * generated that don't need to be decoded. Such records are ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
+ * inserted after the walsender exits. Moreover, the following types of records
+ * could be generated during the pg_upgrade --check, so they are ignored too:
+ * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
+ * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	start_lsn;
+	XLogReaderState *xlogreader;
+	bool		initial_record = true;
+	bool		is_valid = true;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	start_lsn = PG_GETARG_LSN(0);
+
+	/* Quick exit if the given lsn is larger than current one */
+	if (start_lsn >= GetFlushRecPtr(NULL))
+		PG_RETURN_BOOL(false);
+
+	xlogreader = InitXLogReaderState(start_lsn);
+
+	/* Loop until all WALs are read, or unexpected record is found */
+	while (is_valid && ReadNextXLogRecord(xlogreader))
+	{
+		RmgrIds		rmid;
+		uint8		info;
+
+		/* Check the type of WAL */
+		rmid = XLogRecGetRmid(xlogreader);
+		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
+		if (initial_record)
+		{
+			/*
+			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
+			 * XLOG_SWITCH.
+			 */
+			is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+				is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH);
+
+			initial_record = false;
+			continue;
+		}
+
+		/*
+		 * There is a possibility that following records may be generated
+		 * during the upgrade.
+		 */
+		is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) ||
+			is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) ||
+			is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	pfree(xlogreader->private_data);
+	XLogReaderFree(xlogreader);
+
+	PG_RETURN_BOOL(is_valid);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..731b987d33 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -245,27 +259,6 @@ report_clusters_compatible(void)
 }
 
 
-void
-issue_warnings_and_set_wal_level(void)
-{
-	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
-	 */
-	start_postmaster(&new_cluster, true);
-
-	/* Reindex hash indexes for old < 10.0 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-		old_9_6_invalidate_hash_indexes(&new_cluster, false);
-
-	report_extension_updates(&new_cluster);
-
-	stop_postmaster(false);
-}
-
-
 void
 output_completion_banner(char *deletion_script_file_name)
 {
@@ -1451,3 +1444,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('max_replication_slots', 'wal_level') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..189a793106 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. See
+	 * binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(confirmed_flush_lsn)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3ddfc31070 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,8 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
+static void setup_new_cluster(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +190,8 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	setup_new_cluster();
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -201,8 +205,6 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
-
 	pg_log(PG_REPORT,
 		   "\n"
 		   "Upgrade Complete\n"
@@ -593,7 +595,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +864,102 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+}
+
+/*
+ *	setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control file, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)
+{
+	/*
+	 * We unconditionally start/stop the new server because pg_resetwal -o set
+	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
+	 * the rsync instructions, they will need pg_upgrade to write its final
+	 * WAL record showing wal_level as 'replica'.
+	 */
+	start_postmaster(&new_cluster, true);
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * Note: This must be done after executing pg_resetwal command in the
+	 * caller because pg_resetwal would remove required WALs.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_logical_replication_slots();
+
+	/*
+	 * Reindex hash indexes for old < 10.0. count_old_cluster_logical_slots()
+	 * returns non-zero when the old_cluster is PG17 and later, so it's OK to
+	 * use "else if" here. See comments atop count_old_cluster_logical_slots()
+	 * and get_old_cluster_logical_slot_infos().
+	 */
+	else if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
+		old_9_6_invalidate_hash_indexes(&new_cluster, false);
+
+	report_extension_updates(&new_cluster);
+
+	stop_postmaster(false);
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..fb7ee26569 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -345,7 +365,6 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..270044d75e
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,266 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index da32c7db77..ecec87d701 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -429,6 +429,8 @@ extern bool DecodeXLogRecord(XLogReaderState *state,
 
 #ifndef FRONTEND
 extern FullTransactionId XLogRecGetFullXid(XLogReaderState *record);
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
 #endif
 
 extern bool RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..d00d70f2ef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..5c9f8ae4d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v45-0002-Add-rm_is_record_decodable-callback-for-WAL-rmgr.patchapplication/octet-stream; name=v45-0002-Add-rm_is_record_decodable-callback-for-WAL-rmgr.patchDownload

From 39ee4c84288bba30183a986aed458b16f0b12a58 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 3 Oct 2023 04:01:05 +0000
Subject: [PATCH v45] Add rm_is_record_decodable callback for WAL rmgrs

This commit lets WAL resource managers (rmgrs) define if WAL
record types that they add are logically decodable. Every rmgr
including custom rmgrs must define the new callback
rm_is_record_decodable returning true if given WAL record type of
theirs is logically decodable. In other words, return true for
record types that have something to do with logical decoding in
their rm_decode functions.

An immediate use of this new callback is in pg_upgrade. During
pg_upgrade, one can know if there are any logically decodable WAL
records after a certain point in WAL.

Bump WAL version indicator XLOG_PAGE_MAGIC.
---
 doc/src/sgml/custom-rmgr.sgml              |   1 +
 src/backend/access/transam/rmgr.c          |   4 +-
 src/backend/replication/logical/decode.c   | 151 +++++++++++++++++++++
 src/backend/utils/adt/pg_upgrade_support.c |  31 +++--
 src/bin/pg_rewind/parsexlog.c              |   2 +-
 src/bin/pg_waldump/rmgrdesc.c              |   2 +-
 src/include/access/rmgr.h                  |   2 +-
 src/include/access/rmgrlist.h              |  46 +++----
 src/include/access/xlog_internal.h         |   3 +-
 src/include/replication/decode.h           |   7 +
 10 files changed, 207 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/custom-rmgr.sgml b/doc/src/sgml/custom-rmgr.sgml
index baf86b1c07..e804ca0689 100644
--- a/doc/src/sgml/custom-rmgr.sgml
+++ b/doc/src/sgml/custom-rmgr.sgml
@@ -54,6 +54,7 @@ typedef struct RmgrData
     void        (*rm_mask) (char *pagedata, BlockNumber blkno);
     void        (*rm_decode) (struct LogicalDecodingContext *ctx,
                               struct XLogRecordBuffer *buf);
+    bool        (*rm_is_record_decodable) (uint8 type);
 } RmgrData;
 </programlisting>
  </para>
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 7d67eda5f7..001bdf3535 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -35,8 +35,8 @@
 #include "utils/relmapper.h"
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
-	{ name, redo, desc, identify, startup, cleanup, mask, decode },
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,is_record_decodable) \
+	{ name, redo, desc, identify, startup, cleanup, mask, decode, is_record_decodable },
 
 RmgrData	RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..60d26ae015 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -116,7 +116,17 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 	rmgr = GetRmgr(XLogRecGetRmid(record));
 
 	if (rmgr.rm_decode != NULL)
+	{
+#ifdef USE_ASSERT_CHECKING
+		if (rmgr.rm_is_record_decodable == NULL)
+			ereport(ERROR,
+					errmsg("cannot check logical decodability for resource manager \"%s\" with ID %d",
+						   rmgr.rm_name, XLogRecGetRmid(record)),
+					errdetail("Logical decodability callback is not defined for the resource manager."));
+#endif
+
 		rmgr.rm_decode(ctx, &buf);
+	}
 	else
 	{
 		/* just deal with xid, and done */
@@ -196,6 +206,36 @@ xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	}
 }
 
+/*
+ * Return true if given XLOG_ID record type is logically decodable. In other
+ * words, return true for record types that have something to do with logical
+ * decoding in xlog_decode.
+ */
+bool
+xlog_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_CHECKPOINT_SHUTDOWN:
+		case XLOG_END_OF_RECOVERY:
+			return true;
+		case XLOG_CHECKPOINT_ONLINE:
+		case XLOG_PARAMETER_CHANGE:
+		case XLOG_NOOP:
+		case XLOG_NEXTOID:
+		case XLOG_SWITCH:
+		case XLOG_BACKUP_END:
+		case XLOG_RESTORE_POINT:
+		case XLOG_FPW_CHANGE:
+		case XLOG_FPI_FOR_HINT:
+		case XLOG_FPI:
+		case XLOG_OVERWRITE_CONTRECORD:
+			return false;
+		default:
+			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
+	}
+}
+
 /*
  * Handle rmgr XACT_ID records for LogicalDecodingProcessRecord().
  */
@@ -353,6 +393,30 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	}
 }
 
+/*
+ * Return true if given XACT_ID record type is logically decodable. In other
+ * words, return true for record types that have something to do with logical
+ * decoding in xact_decode.
+ */
+bool
+xact_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_XACT_COMMIT:
+		case XLOG_XACT_COMMIT_PREPARED:
+		case XLOG_XACT_ABORT:
+		case XLOG_XACT_ABORT_PREPARED:
+		case XLOG_XACT_INVALIDATIONS:
+		case XLOG_XACT_PREPARE:
+			return true;
+		case XLOG_XACT_ASSIGNMENT:
+			return false;
+		default:
+			elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
+	}
+}
+
 /*
  * Handle rmgr STANDBY_ID records for LogicalDecodingProcessRecord().
  */
@@ -399,6 +463,26 @@ standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	}
 }
 
+/*
+ * Return true if given STANDBY_ID record type is logically decodable. In other
+ * words, return true for record types that have something to do with logical
+ * decoding in standby_decode.
+ */
+bool
+standy_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_RUNNING_XACTS:
+			return true;
+		case XLOG_STANDBY_LOCK:
+		case XLOG_INVALIDATIONS:
+			return false;
+		default:
+			elog(ERROR, "unexpected RM_STANDBY_ID record type: %u", info);
+	}
+}
+
 /*
  * Handle rmgr HEAP2_ID records for LogicalDecodingProcessRecord().
  */
@@ -458,6 +542,31 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	}
 }
 
+/*
+ * Return true if given HEAP2_ID record type is logically decodable. In other
+ * words, return true for record types that have something to do with logical
+ * decoding in heap2_decode.
+ */
+bool
+heap2_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_HEAP2_MULTI_INSERT:
+		case XLOG_HEAP2_NEW_CID:
+			return true;
+		case XLOG_HEAP2_REWRITE:
+		case XLOG_HEAP2_FREEZE_PAGE:
+		case XLOG_HEAP2_PRUNE:
+		case XLOG_HEAP2_VACUUM:
+		case XLOG_HEAP2_VISIBLE:
+		case XLOG_HEAP2_LOCK_UPDATED:
+			return false;
+		default:
+			elog(ERROR, "unexpected RM_HEAP2_ID record type: %u", info);
+	}
+}
+
 /*
  * Handle rmgr HEAP_ID records for LogicalDecodingProcessRecord().
  */
@@ -544,6 +653,31 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	}
 }
 
+/*
+ * Return true if given HEAP_ID record type is logically decodable. In other
+ * words, return true for record types that have something to do with logical
+ * decoding in heap_decode.
+ */
+bool
+heap_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+		case XLOG_HEAP_DELETE:
+		case XLOG_HEAP_TRUNCATE:
+		case XLOG_HEAP_INPLACE:
+		case XLOG_HEAP_CONFIRM:
+			return true;
+		case XLOG_HEAP_LOCK:
+			return false;
+		default:
+			elog(ERROR, "unexpected RM_HEAP_ID record type: %u", info);
+	}
+}
+
 /*
  * Ask output plugin whether we want to skip this PREPARE and send
  * this transaction as a regular commit later.
@@ -640,6 +774,23 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 							  message->message + message->prefix_size);
 }
 
+/*
+ * Return true if given LOGICALMSG_ID record type is logically decodable. In
+ * other words, return true for record types that have something to do with
+ * logical decoding in logicalmsg_decode.
+ */
+bool
+logicalmsg_is_record_decodable(uint8 info)
+{
+	switch (info)
+	{
+		case XLOG_LOGICAL_MESSAGE:
+			return true;
+		default:
+			elog(ERROR, "unexpected RM_LOGICALMSG_ID record type: %u", info);
+	}
+}
+
 /*
  * Consolidated commit record handling between the different form of commit
  * records.
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index bd64da7205..cfd3e448b1 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -13,6 +13,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/xlog.h"
+#include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
@@ -291,10 +292,7 @@ is_xlog_record_type(RmgrId rmgrid, uint8 info,
  * generated that don't need to be decoded. Such records are ignored.
  *
  * XLOG_CHECKPOINT_SHUTDOWN and XLOG_SWITCH are ignored because they would be
- * inserted after the walsender exits. Moreover, the following types of records
- * could be generated during the pg_upgrade --check, so they are ignored too:
- * XLOG_CHECKPOINT_ONLINE, XLOG_RUNNING_XACTS, XLOG_FPI_FOR_HINT,
- * XLOG_HEAP2_PRUNE, XLOG_PARAMETER_CHANGE.
+ * inserted after the walsender exits.
  */
 Datum
 binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
@@ -321,6 +319,7 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 	/* Loop until all WALs are read, or unexpected record is found */
 	while (is_valid && ReadNextXLogRecord(xlogreader))
 	{
+		RmgrData	rmgr;
 		RmgrIds		rmid;
 		uint8		info;
 
@@ -342,16 +341,22 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * There is a possibility that following records may be generated
-		 * during the upgrade.
+		 * Check if the WAL record is logically decodable. We do this because
+		 * there is a possibility that some of the WAL records may be
+		 * generated during the upgrade.
 		 */
-		is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
-			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) ||
-			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
-			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
-			is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) ||
-			is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) ||
-			is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+		rmgr = GetRmgr(XLogRecGetRmid(xlogreader));
+
+		if (rmgr.rm_decode != NULL)
+		{
+			if (rmgr.rm_is_record_decodable != NULL)
+				is_valid = rmgr.rm_is_record_decodable(info);
+			else
+				ereport(ERROR,
+						errmsg("cannot check logical decodability for resource manager \"%s\" with ID %d",
+							   rmgr.rm_name, rmid),
+						errdetail("Logical decodability callback is not defined for the resource manager."));
+		}
 
 		CHECK_FOR_INTERRUPTS();
 	}
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 0233ece88b..0a4f8a3bcd 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -28,7 +28,7 @@
  * RmgrNames is an array of the built-in resource manager names, to make error
  * messages a bit nicer.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,is_record_decodable) \
   name,
 
 static const char *const RmgrNames[RM_MAX_ID + 1] = {
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 6b8c17bb4c..0b2110351c 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -32,7 +32,7 @@
 #include "storage/standbydefs.h"
 #include "utils/relmapper.h"
 
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,is_record_decodable) \
 	{ name, desc, identify},
 
 static const RmgrDescData RmgrDescTable[RM_N_BUILTIN_IDS] = {
diff --git a/src/include/access/rmgr.h b/src/include/access/rmgr.h
index 3b6a497e1b..8c9aed1bc6 100644
--- a/src/include/access/rmgr.h
+++ b/src/include/access/rmgr.h
@@ -19,7 +19,7 @@ typedef uint8 RmgrId;
  * Note: RM_MAX_ID must fit in RmgrId; widening that type will affect the XLOG
  * file format.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,is_record_decodable) \
 	symname,
 
 typedef enum RmgrIds
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 463bcb67c5..a471e77a7c 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -24,26 +24,26 @@
  * Changes to this list possibly need an XLOG_PAGE_MAGIC bump.
  */
 
-/* symbol name, textual name, redo, desc, identify, startup, cleanup, mask, decode */
-PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL, xlog_decode)
-PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL, xact_decode)
-PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, smgr_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, clog_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL, NULL, standby_decode)
-PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL, heap_mask, heap2_decode)
-PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL, heap_mask, heap_decode)
-PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, btree_xlog_startup, btree_xlog_cleanup, btree_mask, NULL)
-PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, hash_identify, NULL, NULL, hash_mask, NULL)
-PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, gin_xlog_startup, gin_xlog_cleanup, gin_mask, NULL)
-PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, gist_xlog_startup, gist_xlog_cleanup, gist_mask, NULL)
-PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL, seq_mask, NULL)
-PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, spg_xlog_startup, spg_xlog_cleanup, spg_mask, NULL)
-PG_RMGR(RM_BRIN_ID, "BRIN", brin_redo, brin_desc, brin_identify, NULL, NULL, brin_mask, NULL)
-PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask, NULL)
-PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL, logicalmsg_decode)
+/* symbol name, textual name, redo, desc, identify, startup, cleanup, mask, decode, is_record_decodable */
+PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL, xlog_decode, xlog_is_record_decodable)
+PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL, xact_decode, xact_is_record_decodable)
+PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, smgr_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, clog_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL, NULL, standby_decode, standy_is_record_decodable)
+PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL, heap_mask, heap2_decode, heap2_is_record_decodable)
+PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL, heap_mask, heap_decode, heap_is_record_decodable)
+PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, btree_xlog_startup, btree_xlog_cleanup, btree_mask, NULL, NULL)
+PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, hash_identify, NULL, NULL, hash_mask, NULL, NULL)
+PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, gin_xlog_startup, gin_xlog_cleanup, gin_mask, NULL, NULL)
+PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, gist_xlog_startup, gist_xlog_cleanup, gist_mask, NULL, NULL)
+PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL, seq_mask, NULL, NULL)
+PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, spg_xlog_startup, spg_xlog_cleanup, spg_mask, NULL, NULL)
+PG_RMGR(RM_BRIN_ID, "BRIN", brin_redo, brin_desc, brin_identify, NULL, NULL, brin_mask, NULL, NULL)
+PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL, NULL, NULL)
+PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask, NULL, NULL)
+PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL, logicalmsg_decode, logicalmsg_is_record_decodable)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index b0fd338a00..6e113ef53d 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -31,7 +31,7 @@
 /*
  * Each page of XLOG file has a header like this:
  */
-#define XLOG_PAGE_MAGIC 0xD113	/* can be used as WAL version indicator */
+#define XLOG_PAGE_MAGIC 0xD114	/* can be used as WAL version indicator */
 
 typedef struct XLogPageHeaderData
 {
@@ -356,6 +356,7 @@ typedef struct RmgrData
 	void		(*rm_mask) (char *pagedata, BlockNumber blkno);
 	void		(*rm_decode) (struct LogicalDecodingContext *ctx,
 							  struct XLogRecordBuffer *buf);
+	bool		(*rm_is_record_decodable) (uint8 type);
 } RmgrData;
 
 extern PGDLLIMPORT RmgrData RmgrTable[];
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
index 14fa921ab4..3885ce671d 100644
--- a/src/include/replication/decode.h
+++ b/src/include/replication/decode.h
@@ -28,6 +28,13 @@ extern void xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 
+extern bool xlog_is_record_decodable(uint8 info);
+extern bool xact_is_record_decodable(uint8 info);
+extern bool standy_is_record_decodable(uint8 info);
+extern bool heap2_is_record_decodable(uint8 info);
+extern bool heap_is_record_decodable(uint8 info);
+extern bool logicalmsg_is_record_decodable(uint8 info);
+
 extern void LogicalDecodingProcessRecord(LogicalDecodingContext *ctx,
 										 XLogReaderState *record);
 
-- 
2.34.1

#295

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#294)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

I did not see the patch, but I like this approach better. I mean this
approach does not check what record types are generated during updagre
instead this directly targets that after the confirmed_flush_lsn what
type of records shouldn't be generated. So if rmgr says that after
commit_flush_lsn no decodable record was generated then we are safe to
upgrade that slot. So this seems an expandable approach.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#296

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#294)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

Thanks for sharing your PoC! I tested yours and worked well. I have also made
the decoding approach locally, but your approach is conceptually faster. I think
it still checks the type one by one so not sure the acceptable, but at least
checkings are centerized. We must hear opinions from others. How do other think?

Comments for your patch. I attached the txt file, please include if it is OK.

1.
According to your post, we must have comments to notify developers that
is_decodable API must be implemented. Please share it too if you have idea.

2.
The existence of is_decodable should be checked in RegisterCustomRmgr().

3.
Anther rmgr API (rm_identify) requries uint8 without doing a bit operation:
they do "info & ~XLR_INFO_MASK" in the callbacks. Should we follow that?

4.
It is helpful for developers to add a function to test_custom_rmgrs module.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

kuroda_mod.txttext/plain; name=kuroda_mod.txtDownload

diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 001bdf3535..850ba7829a 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -117,6 +117,11 @@ RegisterCustomRmgr(RmgrId rmid, const RmgrData *rmgr)
 				 errdetail("Custom resource manager \"%s\" already registered with the same ID.",
 						   RmgrTable[rmid].rm_name)));
 
+	if (rmgr->rm_decode && rmgr->rm_is_record_decodable == NULL)
+		ereport(ERROR,
+				(errmsg("failed to register custom resource manager \"%s\" with ID %d", rmgr->rm_name, rmid),
+				 errdetail("Custom resource manager which has a decode function must have is_reacode_decodable function too.")));
+
 	/* check for existing rmgr with the same name */
 	for (int existing_rmid = 0; existing_rmid <= RM_MAX_ID; existing_rmid++)
 	{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 60d26ae015..2e97962e60 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -214,7 +214,7 @@ xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 xlog_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_CHECKPOINT_SHUTDOWN:
 		case XLOG_END_OF_RECOVERY:
@@ -401,7 +401,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 xact_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_XACT_COMMIT:
 		case XLOG_XACT_COMMIT_PREPARED:
@@ -471,7 +471,7 @@ standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 standy_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_RUNNING_XACTS:
 			return true;
@@ -550,7 +550,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 heap2_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_HEAP2_MULTI_INSERT:
 		case XLOG_HEAP2_NEW_CID:
@@ -661,7 +661,7 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 heap_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_HEAP_INSERT:
 		case XLOG_HEAP_HOT_UPDATE:
@@ -782,7 +782,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 logicalmsg_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_LOGICAL_MESSAGE:
 			return true;
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index cfd3e448b1..f19cb68d92 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -320,15 +320,13 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 	while (is_valid && ReadNextXLogRecord(xlogreader))
 	{
 		RmgrData	rmgr;
-		RmgrIds		rmid;
-		uint8		info;
-
-		/* Check the type of WAL */
-		rmid = XLogRecGetRmid(xlogreader);
-		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
 
 		if (initial_record)
 		{
+			/* Check the type of WAL */
+			RmgrIds		rmid = XLogRecGetRmid(xlogreader);
+			uint8		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
 			/*
 			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
 			 * XLOG_SWITCH.
@@ -350,11 +348,11 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 		if (rmgr.rm_decode != NULL)
 		{
 			if (rmgr.rm_is_record_decodable != NULL)
-				is_valid = rmgr.rm_is_record_decodable(info);
+				is_valid = rmgr.rm_is_record_decodable(XLogRecGetInfo(xlogreader));
 			else
 				ereport(ERROR,
 						errmsg("cannot check logical decodability for resource manager \"%s\" with ID %d",
-							   rmgr.rm_name, rmid),
+							   rmgr.rm_name, XLogRecGetRmid(xlogreader)),
 						errdetail("Logical decodability callback is not defined for the resource manager."));
 		}
 
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 6e113ef53d..44e10c0a94 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -342,6 +342,9 @@ struct XLogRecordBuffer;
  * rm_mask takes as input a page modified by the resource manager and masks
  * out bits that shouldn't be flagged by wal_consistency_checking.
  *
+ * If a resource manager implements rm_decode function, rm_is_record_decodable
+ * function must be also implemented.
+ *
  * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If rm_name is
  * NULL, the corresponding RmgrTable entry is considered invalid.
  */
@@ -356,7 +359,7 @@ typedef struct RmgrData
 	void		(*rm_mask) (char *pagedata, BlockNumber blkno);
 	void		(*rm_decode) (struct LogicalDecodingContext *ctx,
 							  struct XLogRecordBuffer *buf);
-	bool		(*rm_is_record_decodable) (uint8 type);
+	bool		(*rm_is_record_decodable) (uint8 info);
 } RmgrData;
 
 extern PGDLLIMPORT RmgrData RmgrTable[];
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
index 3885ce671d..d8a912296c 100644
--- a/src/include/replication/decode.h
+++ b/src/include/replication/decode.h
@@ -21,6 +21,12 @@ typedef struct XLogRecordBuffer
 	XLogReaderState *record;
 } XLogRecordBuffer;
 
+/*
+ * Decode functions for resource managers.
+ *
+ * Note that if a rmgr has rm_decode function, it must have
+ * rm_is_record_decodable function as well.
+ */
 extern void xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
@@ -28,6 +34,7 @@ extern void xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 
+/* is_record_decodable functions */
 extern bool xlog_is_record_decodable(uint8 info);
 extern bool xact_is_record_decodable(uint8 info);
 extern bool standy_is_record_decodable(uint8 info);
diff --git a/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c b/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
index a304ba54bb..7ac90633f4 100644
--- a/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
+++ b/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
@@ -10,7 +10,7 @@
  *		src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
  *
  * Custom WAL resource manager for records containing a simple textual
- * payload, no-op redo, and no decoding.
+ * payload, no-op redo and decode.
  *
  * -------------------------------------------------------------------------
  */
@@ -21,6 +21,7 @@
 #include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #include "fmgr.h"
+#include "replication/decode.h"
 #include "utils/pg_lsn.h"
 #include "varatt.h"
 
@@ -51,12 +52,17 @@ typedef struct xl_testcustomrmgrs_message
 void		testcustomrmgrs_redo(XLogReaderState *record);
 void		testcustomrmgrs_desc(StringInfo buf, XLogReaderState *record);
 const char *testcustomrmgrs_identify(uint8 info);
+void		testcustomrmgrs_decode(struct LogicalDecodingContext *ctx,
+								   struct XLogRecordBuffer *buf);
+bool		testcustomrmgrs_is_record_decodable(uint8 info);
 
 static const RmgrData testcustomrmgrs_rmgr = {
 	.rm_name = TESTCUSTOMRMGRS_NAME,
 	.rm_redo = testcustomrmgrs_redo,
 	.rm_desc = testcustomrmgrs_desc,
-	.rm_identify = testcustomrmgrs_identify
+	.rm_identify = testcustomrmgrs_identify,
+	.rm_decode = testcustomrmgrs_decode,
+	.rm_is_record_decodable = testcustomrmgrs_is_record_decodable
 };
 
 /*
@@ -111,6 +117,30 @@ testcustomrmgrs_identify(uint8 info)
 	return NULL;
 }
 
+void
+testcustomrmgrs_decode(struct LogicalDecodingContext *ctx,
+					   struct XLogRecordBuffer *buf)
+{
+	XLogReaderState *r = buf->record;
+	uint8		info = XLogRecGetInfo(r) & ~XLR_INFO_MASK;
+
+	if (info != XLOG_TEST_CUSTOM_RMGRS_MESSAGE)
+		elog(PANIC, "testcustomrmgrs_redo: unknown op code %u", info);
+}
+
+bool
+testcustomrmgrs_is_record_decodable(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_TEST_CUSTOM_RMGRS_MESSAGE:
+			return true;
+		default:
+			elog(ERROR, "unexpected RM_TESTCUSTOMRMGRS_ID record type: %u",
+				 info);
+	}
+}
+
 /*
  * SQL function for writing a simple message into WAL with the help of custom
  * WAL resource manager.

#297

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#294)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

+xlog_is_record_decodable(uint8 info)
+{
+ switch (info)
+ {
+ case XLOG_CHECKPOINT_SHUTDOWN:
+ case XLOG_END_OF_RECOVERY:
+ return true;
+ case XLOG_CHECKPOINT_ONLINE:
+ case XLOG_PARAMETER_CHANGE:
...
+ return false;
}

I think this won't behave correctly. Without your patch, we consider
both XLOG_CHECKPOINT_SHUTDOWN and XLOG_CHECKPOINT_ONLINE as valid
records but after patch only one of these will be considered valid
which won't lead to desired behavior.

BTW, the API proposed in your patch returns the WAL record type as
valid if there is something we do for it during decoding but the check
in upgrade function expects the reverse value. For example, for WAL
record type XLOG_HEAP_INSERT, the API returns true and that is
indication to the caller that this is an expected record after
confirmed_flush LSN location which doesn't seem correct. Am I missing
something?

--
With Regards,
Amit Kapila.

#298

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#296)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

While checking more, I found some problems your PoC.

1. rm_is_record_decodable() returns true when WAL records are decodable.
Based on that, should is_valid be false when the function is true?
E.g., XLOG_HEAP_INSERT is accepted in the PoC.
2. XLOG_CHECKPOINT_SHUTDOWN and XLOG_RUNNING_XACTS should return false because
these records may be generated during the upgrade but they are acceptable.
3. A bit operations are done for extracting a WAL type, but the mask is
different based on the rmgr. E.g., XLOG uses XLR_INFO_MASK, but XACT uses
XLOG_XACT_OPMASK.
4. There is a possibility that "XLOG_HEAP_INSERT | XLOG_HEAP_INIT_PAGE" is inserted,
but it is not handled.

Regarding the 2., maybe we should say "if the reorderbuffer is modified while decoding,
rm_is_record_decodable must return false" or something. If so, the return value
of XLOG_END_OF_RECOVERY and XLOG_HEAP2_NEW_CID should be also changed.

I attached the fix patch for above. How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2_kuroda_mod.txttext/plain; name=v2_kuroda_mod.txtDownload

diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 001bdf3535..850ba7829a 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -117,6 +117,11 @@ RegisterCustomRmgr(RmgrId rmid, const RmgrData *rmgr)
 				 errdetail("Custom resource manager \"%s\" already registered with the same ID.",
 						   RmgrTable[rmid].rm_name)));
 
+	if (rmgr->rm_decode && rmgr->rm_is_record_decodable == NULL)
+		ereport(ERROR,
+				(errmsg("failed to register custom resource manager \"%s\" with ID %d", rmgr->rm_name, rmid),
+				 errdetail("Custom resource manager which has a decode function must have is_reacode_decodable function too.")));
+
 	/* check for existing rmgr with the same name */
 	for (int existing_rmid = 0; existing_rmid <= RM_MAX_ID; existing_rmid++)
 	{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 60d26ae015..72a542a06b 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -214,11 +214,10 @@ xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 xlog_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_CHECKPOINT_SHUTDOWN:
 		case XLOG_END_OF_RECOVERY:
-			return true;
 		case XLOG_CHECKPOINT_ONLINE:
 		case XLOG_PARAMETER_CHANGE:
 		case XLOG_NOOP:
@@ -401,7 +400,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 xact_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & XLOG_XACT_OPMASK)
 	{
 		case XLOG_XACT_COMMIT:
 		case XLOG_XACT_COMMIT_PREPARED:
@@ -471,10 +470,9 @@ standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 standy_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_RUNNING_XACTS:
-			return true;
 		case XLOG_STANDBY_LOCK:
 		case XLOG_INVALIDATIONS:
 			return false;
@@ -550,11 +548,11 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 heap2_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & XLOG_HEAP_OPMASK)
 	{
 		case XLOG_HEAP2_MULTI_INSERT:
-		case XLOG_HEAP2_NEW_CID:
 			return true;
+		case XLOG_HEAP2_NEW_CID:
 		case XLOG_HEAP2_REWRITE:
 		case XLOG_HEAP2_FREEZE_PAGE:
 		case XLOG_HEAP2_PRUNE:
@@ -661,9 +659,10 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 heap_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & XLOG_HEAP_OPMASK)
 	{
 		case XLOG_HEAP_INSERT:
+		case XLOG_HEAP_INSERT | XLOG_HEAP_INIT_PAGE:
 		case XLOG_HEAP_HOT_UPDATE:
 		case XLOG_HEAP_UPDATE:
 		case XLOG_HEAP_DELETE:
@@ -782,7 +781,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 bool
 logicalmsg_is_record_decodable(uint8 info)
 {
-	switch (info)
+	switch (info & ~XLR_INFO_MASK)
 	{
 		case XLOG_LOGICAL_MESSAGE:
 			return true;
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index cfd3e448b1..52084dc644 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -320,19 +320,18 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 	while (is_valid && ReadNextXLogRecord(xlogreader))
 	{
 		RmgrData	rmgr;
-		RmgrIds		rmid;
-		uint8		info;
-
-		/* Check the type of WAL */
-		rmid = XLogRecGetRmid(xlogreader);
-		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
 
 		if (initial_record)
 		{
 			/*
-			 * Initial record must be either XLOG_CHECKPOINT_SHUTDOWN or
-			 * XLOG_SWITCH.
+			 * Verify that the initial record is either
+			 * XLOG_CHECKPOINT_SHUTDOWN or XLOG_SWITCH. Both of record types
+			 * are in the RM_XLOG_ID rmgr, so it's OK to use XLR_INFO_MASK as
+			 * mask.
 			 */
+			RmgrIds		rmid = XLogRecGetRmid(xlogreader);
+			uint8		info = XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK;
+
 			is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) ||
 				is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH);
 
@@ -350,11 +349,14 @@ binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
 		if (rmgr.rm_decode != NULL)
 		{
 			if (rmgr.rm_is_record_decodable != NULL)
-				is_valid = rmgr.rm_is_record_decodable(info);
+			{
+				/* If the record is decodable, the upgrade should fail */
+				is_valid = !rmgr.rm_is_record_decodable(XLogRecGetInfo(xlogreader));
+			}
 			else
 				ereport(ERROR,
 						errmsg("cannot check logical decodability for resource manager \"%s\" with ID %d",
-							   rmgr.rm_name, rmid),
+							   rmgr.rm_name, XLogRecGetRmid(xlogreader)),
 						errdetail("Logical decodability callback is not defined for the resource manager."));
 		}
 
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 2c4f38d865..0248079566 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -40,8 +40,8 @@ tests += {
   'tap': {
     'env': {'with_icu': icu.found() ? 'yes' : 'no'},
     'tests': [
-      't/001_basic.pl',
-      't/002_pg_upgrade.pl',
+#      't/001_basic.pl',
+#      't/002_pg_upgrade.pl',
       't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index a471e77a7c..8d6c7ff0ca 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -21,6 +21,9 @@
  * entries should be added at the end, to avoid changing IDs of existing
  * entries.
  *
+ * rm_is_record_decodable must return false when the reorderbuffer is modified
+ * while decoding, it returns true otherwise.
+ *
  * Changes to this list possibly need an XLOG_PAGE_MAGIC bump.
  */
 
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 6e113ef53d..44e10c0a94 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -342,6 +342,9 @@ struct XLogRecordBuffer;
  * rm_mask takes as input a page modified by the resource manager and masks
  * out bits that shouldn't be flagged by wal_consistency_checking.
  *
+ * If a resource manager implements rm_decode function, rm_is_record_decodable
+ * function must be also implemented.
+ *
  * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If rm_name is
  * NULL, the corresponding RmgrTable entry is considered invalid.
  */
@@ -356,7 +359,7 @@ typedef struct RmgrData
 	void		(*rm_mask) (char *pagedata, BlockNumber blkno);
 	void		(*rm_decode) (struct LogicalDecodingContext *ctx,
 							  struct XLogRecordBuffer *buf);
-	bool		(*rm_is_record_decodable) (uint8 type);
+	bool		(*rm_is_record_decodable) (uint8 info);
 } RmgrData;
 
 extern PGDLLIMPORT RmgrData RmgrTable[];
diff --git a/src/include/replication/decode.h b/src/include/replication/decode.h
index 3885ce671d..d8a912296c 100644
--- a/src/include/replication/decode.h
+++ b/src/include/replication/decode.h
@@ -21,6 +21,12 @@ typedef struct XLogRecordBuffer
 	XLogReaderState *record;
 } XLogRecordBuffer;
 
+/*
+ * Decode functions for resource managers.
+ *
+ * Note that if a rmgr has rm_decode function, it must have
+ * rm_is_record_decodable function as well.
+ */
 extern void xlog_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
@@ -28,6 +34,7 @@ extern void xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void standby_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 extern void logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 
+/* is_record_decodable functions */
 extern bool xlog_is_record_decodable(uint8 info);
 extern bool xact_is_record_decodable(uint8 info);
 extern bool standy_is_record_decodable(uint8 info);
diff --git a/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c b/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
index a304ba54bb..7ac90633f4 100644
--- a/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
+++ b/src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
@@ -10,7 +10,7 @@
  *		src/test/modules/test_custom_rmgrs/test_custom_rmgrs.c
  *
  * Custom WAL resource manager for records containing a simple textual
- * payload, no-op redo, and no decoding.
+ * payload, no-op redo and decode.
  *
  * -------------------------------------------------------------------------
  */
@@ -21,6 +21,7 @@
 #include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #include "fmgr.h"
+#include "replication/decode.h"
 #include "utils/pg_lsn.h"
 #include "varatt.h"
 
@@ -51,12 +52,17 @@ typedef struct xl_testcustomrmgrs_message
 void		testcustomrmgrs_redo(XLogReaderState *record);
 void		testcustomrmgrs_desc(StringInfo buf, XLogReaderState *record);
 const char *testcustomrmgrs_identify(uint8 info);
+void		testcustomrmgrs_decode(struct LogicalDecodingContext *ctx,
+								   struct XLogRecordBuffer *buf);
+bool		testcustomrmgrs_is_record_decodable(uint8 info);
 
 static const RmgrData testcustomrmgrs_rmgr = {
 	.rm_name = TESTCUSTOMRMGRS_NAME,
 	.rm_redo = testcustomrmgrs_redo,
 	.rm_desc = testcustomrmgrs_desc,
-	.rm_identify = testcustomrmgrs_identify
+	.rm_identify = testcustomrmgrs_identify,
+	.rm_decode = testcustomrmgrs_decode,
+	.rm_is_record_decodable = testcustomrmgrs_is_record_decodable
 };
 
 /*
@@ -111,6 +117,30 @@ testcustomrmgrs_identify(uint8 info)
 	return NULL;
 }
 
+void
+testcustomrmgrs_decode(struct LogicalDecodingContext *ctx,
+					   struct XLogRecordBuffer *buf)
+{
+	XLogReaderState *r = buf->record;
+	uint8		info = XLogRecGetInfo(r) & ~XLR_INFO_MASK;
+
+	if (info != XLOG_TEST_CUSTOM_RMGRS_MESSAGE)
+		elog(PANIC, "testcustomrmgrs_redo: unknown op code %u", info);
+}
+
+bool
+testcustomrmgrs_is_record_decodable(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_TEST_CUSTOM_RMGRS_MESSAGE:
+			return true;
+		default:
+			elog(ERROR, "unexpected RM_TESTCUSTOMRMGRS_ID record type: %u",
+				 info);
+	}
+}
+
 /*
  * SQL function for writing a simple message into WAL with the help of custom
  * WAL resource manager.

#299

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#294)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

Thoughts?

--
With Regards,
Amit Kapila.

#300

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#299)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream.

So I assume in each lower-level decode function (e.g. heap_decode() )
we will add the check that if we are checking the WAL for an upgrade
then from that level we will return true or false based on whether the
WAL is decodable or not. Is my understanding correct? At first
thought this approach look better and generic.

In this new API,

we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

Yeah, that makes sense.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

Yeah, that looks cleaner IMHO.

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#301

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#299)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit, Andres,

Thank you for giving the decision! Basically I will follow your idea and make
a patch accordingly.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

I felt the approach seems similar to Hou-san's suggestion[1]/messages/by-id/OS0PR01MB5716506A1A1B20EFBFA7B52994C1A@OS0PR01MB5716.jpnprd01.prod.outlook.com, but we can avoid to
use test_decoding. I'm planning to do that the upgrading function decodes WALs
and check whether there are reorderbuffer changes.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

It seems better because we can create every objects before pg_resetwal.

I will handle above two points and let's see how it work.

[1]: /messages/by-id/OS0PR01MB5716506A1A1B20EFBFA7B52994C1A@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#302

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#300)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 5, 2023 at 2:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because

the function

would be used only by pg_upgrade. Could you tell me if you have

another use case

in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream.

So I assume in each lower-level decode function (e.g. heap_decode() )
we will add the check that if we are checking the WAL for an upgrade
then from that level we will return true or false based on whether the
WAL is decodable or not. Is my understanding correct?

Yes, this is one way to achive but I think this will require changing
return value of many APIs. Can we somehow just get this via
LogicalDecodingContext or some other way at the caller by allowing to set
some variable at required places?

#303

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#302)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 5, 2023 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream.

+1 for this approach. It looks neat.

I think we also need to add TAP tests to generate decodable WAL
records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
pg_upgrade as described here
/messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com.
Basically, these were the exceptional WAL records that may be
generated by pg_upgrade, so having tests for them is good.

So I assume in each lower-level decode function (e.g. heap_decode() )
we will add the check that if we are checking the WAL for an upgrade
then from that level we will return true or false based on whether the
WAL is decodable or not. Is my understanding correct?

Yes, this is one way to achive but I think this will require changing return value of many APIs. Can we somehow just get this via LogicalDecodingContext or some other way at the caller by allowing to set some variable at required places?

+1 for adding the required flags to the decoding context similar to
fast_forward.

Another way without adding any new variables is to pass the WAL record
to LogicalDecodingProcessRecord, and upon return check the reorder
buffer if there's any decoded change generated for the xid associated
with the WAL record. If any decoded change related to the WAL record
xid is found, then that's the end for the new function. Here's what I
think [1]change_found = false; end_of_wal = false; ctx = CreateDecodingContext();, haven't tested it.

[1]: change_found = false; end_of_wal = false; ctx = CreateDecodingContext();
change_found = false;
end_of_wal = false;
ctx = CreateDecodingContext();

XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);

while(!end_of_wal || !change_found)
{
XLogRecord *record;
TransactionId xid;
ReorderBufferTXN *txn;

record = XLogReadRecord(ctx->reader, &errm);

if (record)
LogicalDecodingProcessRecord(ctx, ctx->reader);

xid = XLogRecGetXid(record);

txn = ReorderBufferTXNByXid(ctx->reorder, xid, false, NULL,
InvalidXLogRecPtr,
false);

if (txn != NULL)
{
change_found = true;
break;
}

CHECK_FOR_INTERRUPTS();
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#304

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 2 years ago

In reply to: Amit Kapila (#299)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

+1. I think this approach makes it foolproof. pg_resetwal uses
FindEndOfXLOG and we need that to be in a binary mode SQL callable
function. FindEndOfXLOG ignores TLI to compute the new WAL file name,
but that seems to be okay for the new binary mode function because
pg_upgrade uses TLI 1 anyways and doesn't copy WAL files from old
cluster.

FWIW, pg_upgrades does use -l in copy_xact_xlog_xid, I'm not sure if
it has anything to do with the above proposed change.

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

Thoughts?

You mean walreceiver for streaming replication? Or the apply workers
going down for logical replication? If there's yet-to-be-sent-out WAL,
pg_upgrade will fail no? How does the above scenario a problem for
pg_upgrade of a cluster with just logical replication slots?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#305

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#301)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>

Dear hackers,

Based on comments, I revised my patch. PSA the file.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

I implemented this by using decoding context. The binary upgrade function
processes WALs from the confirmed_flush, and returns false if some meaningful
changes are found.

Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
If the decoding context is in the mode, the output plugin is not loaded, but
any WALs are decoded without skipping. Also, a new flag "did_process" is also
added. This flag is set if wrappers for output plugin callbacks are called during
the silent mode. The upgrading function checks both reorder buffer and the new
flag because both (non-)transactional changes should be detected. If we only
check reorder buffer, we miss the non-transactional one.

fast_forward was changed as a variant of decoding mode.

Currently the function is called for all the valid slot. If the approach seems
good, we can refactor like Bharath said [1]/messages/by-id/CALj2ACWAdYxgzOpXrP=JMiOaWtAT2VjPiKw7ryGbipkSkocJ=g@mail.gmail.com.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

One consideration is that pg_log_standby_snapshot() must be executed before
slots consuming changes. New cluster does not have RUNNING_XACTS records so that
decoding context on new cluster cannot be create a consistent snapshot as-is.
This may lead to discard changes during the upcoming consuming event. To
prevent it the function is called after the final pg_resetwal.

How do you think?

Acknowledgment: I would like to thank Hou for discussing with me.

[1]: /messages/by-id/CALj2ACWAdYxgzOpXrP=JMiOaWtAT2VjPiKw7ryGbipkSkocJ=g@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v46-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v46-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 3ef75e61300d9c0e768c969a2b5acc9fbdff2bf8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v46] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  76 ++++-
 src/backend/replication/logical/decode.c      |  13 +-
 src/backend/replication/logical/logical.c     | 227 ++++++++++++---
 .../replication/logical/logicalfuncs.c        |   2 +-
 src/backend/replication/slot.c                |  12 +
 src/backend/replication/slotfuncs.c           |   4 +-
 src/backend/replication/walsender.c           |   2 +-
 src/backend/utils/adt/pg_upgrade_support.c    | 128 +++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 ++++++++++++-
 src/bin/pg_upgrade/function.c                 |  31 +-
 src/bin/pg_upgrade/info.c                     | 168 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 104 ++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  23 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 266 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  12 +
 src/include/replication/logical.h             |  37 ++-
 src/include/replication/slot.h                |   4 +
 src/tools/pgindent/typedefs.list              |   3 +
 21 files changed, 1275 insertions(+), 68 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index f17fdb1ba5..4d579e793d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +721,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..67c3a45166 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				 */
 				if (TransactionIdIsValid(xid))
 				{
-					if (!ctx->fast_forward)
+					if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
 						ReorderBufferAddInvalidations(reorder, xid,
 													  buf->origptr,
 													  invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 					ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
 													  buf->origptr);
 				}
-				else if ((!ctx->fast_forward))
+				else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
 					ReorderBufferImmediateInvalidation(ctx->reorder,
 													   invals->nmsgs,
 													   invals->msgs);
@@ -416,7 +416,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding changes.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	switch (info)
@@ -475,7 +475,7 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding data changes.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	switch (info)
@@ -604,7 +604,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding messages.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -1287,5 +1287,6 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 {
 	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
 			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+			ctx->decoding_mode == DECODING_MODE_FAST_FORWARD ||
+			FilterByOrigin(ctx, origin_id));
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..2856a8c8c9 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -41,6 +41,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -150,7 +151,7 @@ StartupDecodingContext(List *output_plugin_options,
 					   XLogRecPtr start_lsn,
 					   TransactionId xmin_horizon,
 					   bool need_full_snapshot,
-					   bool fast_forward,
+					   DecodingMode decoding_mode,
 					   XLogReaderRoutine *xl_routine,
 					   LogicalOutputPluginWriterPrepareWrite prepare_write,
 					   LogicalOutputPluginWriterWrite do_write,
@@ -176,7 +177,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 * (re-)load output plugins, so we detect a bad (removed) output plugin
 	 * now.
 	 */
-	if (!fast_forward)
+	if (decoding_mode == DECODING_MODE_NORMAL)
 		LoadOutputPlugin(&ctx->callbacks, NameStr(slot->data.plugin));
 
 	/*
@@ -294,7 +295,7 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->output_plugin_options = output_plugin_options;
 
-	ctx->fast_forward = fast_forward;
+	ctx->decoding_mode = decoding_mode;
 
 	MemoryContextSwitchTo(old_context);
 
@@ -437,7 +438,7 @@ CreateInitDecodingContext(const char *plugin,
 	ReplicationSlotSave();
 
 	ctx = StartupDecodingContext(NIL, restart_lsn, xmin_horizon,
-								 need_full_snapshot, false,
+								 need_full_snapshot, DECODING_MODE_NORMAL,
 								 xl_routine, prepare_write, do_write,
 								 update_progress);
 
@@ -473,8 +474,8 @@ CreateInitDecodingContext(const char *plugin,
  * output_plugin_options
  *		options passed to the output plugin.
  *
- * fast_forward
- *		bypass the generation of logical changes.
+ * decoding_mode
+ *		See the definition of DecodingMode for details.
  *
  * xl_routine
  *		XLogReaderRoutine used by underlying xlogreader
@@ -493,7 +494,7 @@ CreateInitDecodingContext(const char *plugin,
 LogicalDecodingContext *
 CreateDecodingContext(XLogRecPtr start_lsn,
 					  List *output_plugin_options,
-					  bool fast_forward,
+					  DecodingMode decoding_mode,
 					  XLogReaderRoutine *xl_routine,
 					  LogicalOutputPluginWriterPrepareWrite prepare_write,
 					  LogicalOutputPluginWriterWrite do_write,
@@ -573,8 +574,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 
 	ctx = StartupDecodingContext(output_plugin_options,
 								 start_lsn, InvalidTransactionId, false,
-								 fast_forward, xl_routine, prepare_write,
-								 do_write, update_progress);
+								 decoding_mode, xl_routine,
+								 prepare_write, do_write, update_progress);
 
 	/* call output plugin initialization callback */
 	old_context = MemoryContextSwitchTo(ctx->context);
@@ -773,7 +774,14 @@ startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt, bool i
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = false;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -801,7 +809,14 @@ shutdown_cb_wrapper(LogicalDecodingContext *ctx)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = false;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -835,7 +850,14 @@ begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -867,7 +889,14 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -905,7 +934,11 @@ begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -950,7 +983,11 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -995,7 +1032,11 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -1041,7 +1082,11 @@ rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -1087,7 +1132,14 @@ change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1126,9 +1178,15 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
 
-	if (!ctx->callbacks.truncate_cb)
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
+	else if (!ctx->callbacks.truncate_cb)
 		return;
 
 	/* Push callback + info on the error context stack */
@@ -1168,7 +1226,14 @@ filter_prepare_cb_wrapper(LogicalDecodingContext *ctx, TransactionId xid,
 	ErrorContextCallback errcallback;
 	bool		ret;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return false;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1199,7 +1264,14 @@ filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
 	ErrorContextCallback errcallback;
 	bool		ret;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return false;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1232,9 +1304,15 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
 
-	if (ctx->callbacks.message_cb == NULL)
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
+	else if (ctx->callbacks.message_cb == NULL)
 		return;
 
 	/* Push callback + info on the error context stack */
@@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1317,7 +1399,11 @@ stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1366,7 +1452,11 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1407,7 +1497,11 @@ stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/*
 	 * We're only supposed to call this when streaming and two-phase commits
@@ -1452,7 +1546,11 @@ stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1493,7 +1591,11 @@ stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1543,7 +1645,11 @@ stream_message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In silent mode all the two-phase callbacks are not set so that the
+	 * wrapper should not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1584,7 +1690,7 @@ stream_truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1630,7 +1736,14 @@ update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
+
+	/* Quick exit if we are in the silent mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+	{
+		ctx->did_process = true;
+		return;
+	}
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1949,3 +2062,51 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read from the decoding slot, and return true when meaningful changes are
+ * found. Otherwise false.
+ *
+ * Currently the function is used only for upgrading purpose, but there are no
+ * reasons to restrict. So the IsBinaryUpgrade is not checked here.
+ */
+bool
+DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
+							   XLogRecPtr end_of_wal)
+{
+	bool		found = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know to point to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are found */
+	while (!found && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		/* Check whether the meaningful change was found */
+		found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+				 ctx->did_process);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	return found;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 197169d6b0..d3f8e22bf6 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -207,7 +207,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									options,
-									false,
+									DECODING_MODE_NORMAL,
 									XL_ROUTINE(.page_read = read_local_xlog_page,
 											   .segment_open = wal_segment_open,
 											   .segment_close = wal_segment_close),
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6035cf4816..86cb112cdf 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -114,7 +114,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  * When find_startpoint is false, the slot's confirmed_flush is not set; it's
  * caller's responsibility to ensure it's set to something sensible.
  */
-static void
+void
 create_logical_replication_slot(char *name, char *plugin,
 								bool temporary, bool two_phase,
 								XLogRecPtr restart_lsn,
@@ -485,7 +485,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
-									true,	/* fast_forward */
+									DECODING_MODE_FAST_FORWARD,
 									XL_ROUTINE(.page_read = read_local_xlog_page,
 											   .segment_open = wal_segment_open,
 											   .segment_close = wal_segment_close),
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e250b0567e..b3b819d996 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1283,7 +1283,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	 * are reported early.
 	 */
 	logical_decoding_ctx =
-		CreateDecodingContext(cmd->startpoint, cmd->options, false,
+		CreateDecodingContext(cmd->startpoint, cmd->options, DECODING_MODE_NORMAL,
 							  XL_ROUTINE(.page_read = logical_read_xlog_page,
 										 .segment_open = WalSndSegmentOpen,
 										 .segment_close = wal_segment_close),
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..e149fedee6 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,125 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Return false if we found unexpected WAL records, otherwise true.
+ *
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	LogicalDecodingContext *ctx = NULL;
+	bool		has_record;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+	else
+		slot_name = PG_GETARG_NAME(0);
+
+	/*
+	 * Acquire the given slot. The error would not be happened because the
+	 * caller has already checked the existence of slot.
+	 */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* XXX: Is PG_TRY/CATCH needed around here? */
+
+	/*
+	 * We use silent mode here to decode all changes without outputting them,
+	 * allowing us to detect all the records that could be sent downstream.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								DECODING_MODE_SILENT,
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	has_record = DecodingContextHasdecodedItems(ctx, end_of_wal);
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!has_record);
+}
+
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+	Name		name = PG_GETARG_NAME(0);
+	Name		plugin = PG_GETARG_NAME(1);
+
+	/* Temporary slots is never handled in this function */
+	bool		two_phase = PG_GETARG_BOOL(2);
+	XLogSegNo	xlogsegno;
+	char		xlogfilename[MAXFNAMELEN];
+	XLogRecPtr	restart_lsn;
+
+	Datum		result;
+	TupleDesc	tupdesc;
+	HeapTuple	tuple;
+	Datum		values[3];
+	bool		nulls[3] = {0};
+
+	CheckSlotPermissions();
+
+	CheckLogicalDecodingRequirements();
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	/* Calculate the next WAL segment and its LSN */
+	XLByteToPrevSeg(GetFlushRecPtr(NULL), xlogsegno, wal_segment_size);
+	XLogFileName(xlogfilename, (TimeLineID) 1, xlogsegno + 1,
+				 wal_segment_size);
+	XLogSegNoOffsetToRecPtr(xlogsegno + 1, 0, wal_segment_size, restart_lsn);
+
+	/*
+	 * Create a given replication slot. confirmed_flush is the same as
+	 * restart_lsn for now.
+	 */
+	create_logical_replication_slot(NameStr(*name),
+									NameStr(*plugin),
+									false,
+									two_phase,
+									restart_lsn,
+									false);
+
+	MyReplicationSlot->data.confirmed_flush = restart_lsn;
+
+	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+	values[1] = LSNGetDatum(MyReplicationSlot->data.confirmed_flush);
+	values[2] = CStringGetTextDatum(xlogfilename);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	/* ok, slot is now fully created, mark it as persistent */
+	ReplicationSlotPersist();
+	ReplicationSlotRelease();
+
+	PG_RETURN_DATUM(result);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..30ddf74b02 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
+static void create_consistent_snapshot(void);
 
 
 /*
@@ -89,8 +92,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +113,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +213,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +236,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -245,6 +260,27 @@ report_clusters_compatible(void)
 }
 
 
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+	DbInfo	   *old_db = &old_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Creating a consitent snapshot on new cluster");
+
+	conn = connectToServer(&new_cluster, old_db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT pg_log_standby_snapshot();"));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
 issue_warnings_and_set_wal_level(void)
 {
@@ -256,6 +292,14 @@ issue_warnings_and_set_wal_level(void)
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Also, we mu execute pg_log_standby_snapshot() when logical replication
+	 * slots are migrated. Because RUNNING_XACTS record is required to create
+	 * a consistent snapshot.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_consistent_snapshot();
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
@@ -1451,3 +1495,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('max_replication_slots', 'wal_level') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..d975aed562 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo".
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The slot is considered
+	 * caught up if all the WAL is consumed except for records that could be
+	 * generated during the upgrade. See
+	 * binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..53ab00f238 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static char *create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -81,6 +82,8 @@ main(int argc, char **argv)
 {
 	char	   *deletion_script_file_name = NULL;
 	bool		live_check = false;
+	char	   *xlogfilename = NULL;
+	PQExpBufferData resetwal_options;
 
 	/*
 	 * pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -175,6 +178,20 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * The function returns the next wal segment file which must be passed to
+	 * upcoming pg_resetwal command.
+	 */
+
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		xlogfilename = create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	/*
 	 * Assuming OIDs are only used in system tables, there is no need to
 	 * restore the OID counter because we have not transferred any OIDs from
@@ -182,10 +199,20 @@ main(int argc, char **argv)
 	 * because there is no need to have the schema load use new oids.
 	 */
 	prep_status("Setting next OID for new cluster");
+
+	initPQExpBuffer(&resetwal_options);
+	appendPQExpBuffer(&resetwal_options, "-o %u \"%s\"",
+					  old_cluster.controldata.chkpnt_nxtoid,
+					  new_cluster.pgdata);
+
+	/* If next wal segment is given, use it */
+	if (xlogfilename)
+		appendPQExpBuffer(&resetwal_options, " -l %s", xlogfilename);
+
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
+			  "\"%s/pg_resetwal\" %s",
+			  new_cluster.bindir, resetwal_options.data);
+
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -593,7 +620,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +889,72 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static char *
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+	char	   *xlogfilename = NULL;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			PGresult   *res;
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots. */
+			appendPQExpBuffer(query, "SELECT * FROM pg_catalog.binary_upgrade_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			res = executeQueryOrDie(conn, "%s", query->data);
+
+			Assert(PQntuples(res) == 1 && PQnfields(res) == 3);
+
+			if (xlogfilename == NULL)
+				xlogfilename = pg_strdup(PQgetvalue(res, 0, 2));
+
+			PQclear(res);
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return xlogfilename;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..a45a77bcb1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,25 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is confirmed_flush_lsn the same as latest
+								 * checkpoint LSN? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +195,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +420,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..270044d75e
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,266 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..eacd91eb67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,18 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
+{ oid => '8047', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_create_logical_replication_slot', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'record',
+  proargtypes => 'name name bool',
+  proallargtypes => '{name,name,bool,name,pg_lsn,text}',
+  proargmodes => '{i,i,i,o,o,o}',
+  prosrc => 'binary_upgrade_create_logical_replication_slot' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..7d40d7678c 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -30,6 +30,24 @@ typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct LogicalDecodingC
 														 bool skipped_xact
 );
 
+typedef enum DecodingMode
+{
+	/* Decode and output the changes if needed using the output plugin */
+	DECODING_MODE_NORMAL,
+
+	/*
+	 * Fast-forward decoding mode: Skips loading the output plugin and
+	 * bypasses decoding most changes in a transaction.
+	 */
+	DECODING_MODE_FAST_FORWARD,
+
+	/*
+	 * Silent decoding mode: Skips loading the output plugin and decodes all
+	 * changes without emitting any output.
+	 */
+	DECODING_MODE_SILENT
+} DecodingMode;
+
 typedef struct LogicalDecodingContext
 {
 	/* memory context this is all allocated in */
@@ -44,11 +62,11 @@ typedef struct LogicalDecodingContext
 	struct SnapBuild *snapshot_builder;
 
 	/*
-	 * Marks the logical decoding context as fast forward decoding one. Such a
-	 * context does not have plugin loaded so most of the following properties
-	 * are unused.
+	 * For DECODING_MODE_FAST_FORWARD and DECODING_MODE_SILENT, the context
+	 * does not have plugin loaded so most of the following properties are
+	 * unused.
 	 */
-	bool		fast_forward;
+	DecodingMode decoding_mode;
 
 	OutputPluginCallbacks callbacks;
 	OutputPluginOptions options;
@@ -109,6 +127,12 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/*
+	 * Did the logical decoding context process any changes? This flag is used
+	 * only when the context is in the silent mode.
+	 */
+	bool		did_process;
 } LogicalDecodingContext;
 
 
@@ -124,7 +148,7 @@ extern LogicalDecodingContext *CreateInitDecodingContext(const char *plugin,
 														 LogicalOutputPluginWriterUpdateProgress update_progress);
 extern LogicalDecodingContext *CreateDecodingContext(XLogRecPtr start_lsn,
 													 List *output_plugin_options,
-													 bool fast_forward,
+													 DecodingMode decoding_mode,
 													 XLogReaderRoutine *xl_routine,
 													 LogicalOutputPluginWriterPrepareWrite prepare_write,
 													 LogicalOutputPluginWriterWrite do_write,
@@ -145,4 +169,7 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
+										   XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 758ca79a81..6559d3f014 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -227,6 +227,10 @@ extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
+extern void create_logical_replication_slot(char *name, char *plugin,
+											bool temporary, bool two_phase,
+											XLogRecPtr restart_lsn,
+											bool find_startpoint);
 
 /* misc stuff */
 extern void ReplicationSlotInitialize(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de90c4958..b75a69f543 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -562,6 +562,7 @@ DeallocateStmt
 DeclareCursorStmt
 DecodedBkpBlock
 DecodedXLogRecord
+DecodingMode
 DecodingOutputState
 DefElem
 DefElemAction
@@ -1503,6 +1504,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#306

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#305)

Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>

On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Based on comments, I revised my patch. PSA the file.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

I implemented this by using decoding context. The binary upgrade function
processes WALs from the confirmed_flush, and returns false if some meaningful
changes are found.

Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
If the decoding context is in the mode, the output plugin is not loaded, but
any WALs are decoded without skipping.

I think it may be okay not to load the output plugin as we are not
going to process any record in this case but is that the only reason
or you have something else in mind as well?

Also, a new flag "did_process" is also
added. This flag is set if wrappers for output plugin callbacks are called during
the silent mode.

Isn't it sufficient to add a test for silent mode in
begin/stream_start/begin_prepare kind of APIs and set
ctx->did_process? In all other APIs, we can assert that did_process
shouldn't be set and we never reach there when decoding mode is
silent.

The upgrading function checks both reorder buffer and the new
flag because both (non-)transactional changes should be detected. If we only
check reorder buffer, we miss the non-transactional one.

+ /* Check whether the meaningful change was found */
+ found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+ ctx->did_process);

Are you talking about this check in the patch? If so, can you please
explain when does the first check help?

fast_forward was changed as a variant of decoding mode.

Currently the function is called for all the valid slot. If the approach seems
good, we can refactor like Bharath said [1].

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

I am not sure if it is a good idea that a
binary_upgrade_create_logical_replication_slot() API does the logfile
name calculation.

One consideration is that pg_log_standby_snapshot() must be executed before
slots consuming changes. New cluster does not have RUNNING_XACTS records so that
decoding context on new cluster cannot be create a consistent snapshot as-is.
This may lead to discard changes during the upcoming consuming event. To
prevent it the function is called after the final pg_resetwal.

How do you think?

+ /*
+ * Also, we mu execute pg_log_standby_snapshot() when logical replication
+ * slots are migrated. Because RUNNING_XACTS record is required to create
+ * a consistent snapshot.
+ */
+ if (count_old_cluster_logical_slots())
+ create_consistent_snapshot();

We shouldn't do this separately. Instead
binary_upgrade_create_logical_replication_slot() should ensure that
corresponding WAL is reserved similar to what we do in
ReplicationSlotReserveWal() and then similarly invoke
LogStandbySnapshot() to ensure that we have enough information to
start.

Few minor comments:
==================
1. The commit message and other comments like atop
get_old_cluster_logical_slot_infos() needs to be adjusted as per
recent changes.
2.
@@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
LogicalErrorCallbackState state;
ErrorContextCallback errcallback;

- Assert(!ctx->fast_forward);
+ /*
+ * In silent mode all the two-phase callbacks are not set so that the
+ * wrapper should not be called.
+ */
+ Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);

This and other similar comments doesn't seems to be consistent as the
function name and comments are not matching.

With Regards,
Amit Kapila.

#307

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Bharath Rupireddy (#304)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 5, 2023 at 6:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

Thoughts?

You mean walreceiver for streaming replication? Or the apply workers
going down for logical replication?

Apply workers.

If there's yet-to-be-sent-out WAL,
pg_upgrade will fail no? How does the above scenario a problem for
pg_upgrade of a cluster with just logical replication slots?

Even, if there is a WAL yet to be sent, the walsender will simply exit
as it will receive PqMsg_Terminate ('X') from standby. See
ProcessRepliesIfAny(). After that shutdown checkpoint will finish. So,
in this case upgrade can fail due to slots. But, I think the server
should be able to succeed in consecutive runs. Does this make sense?

--
With Regards,
Amit Kapila.

#308

vignesh C

vignesh21@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#305)

Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>

On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

Based on comments, I revised my patch. PSA the file.

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

I implemented this by using decoding context. The binary upgrade function
processes WALs from the confirmed_flush, and returns false if some meaningful
changes are found.

Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
If the decoding context is in the mode, the output plugin is not loaded, but
any WALs are decoded without skipping. Also, a new flag "did_process" is also
added. This flag is set if wrappers for output plugin callbacks are called during
the silent mode. The upgrading function checks both reorder buffer and the new
flag because both (non-)transactional changes should be detected. If we only
check reorder buffer, we miss the non-transactional one.

fast_forward was changed as a variant of decoding mode.

Currently the function is called for all the valid slot. If the approach seems
good, we can refactor like Bharath said [1].

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

One consideration is that pg_log_standby_snapshot() must be executed before
slots consuming changes. New cluster does not have RUNNING_XACTS records so that
decoding context on new cluster cannot be create a consistent snapshot as-is.
This may lead to discard changes during the upcoming consuming event. To
prevent it the function is called after the final pg_resetwal.

Few comments:
1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */
+       bool            two_phase = PG_GETARG_BOOL(2);

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+       Name            slot_name;
+       XLogRecPtr      end_of_wal;
+       LogicalDecodingContext *ctx = NULL;
+       bool            has_record;
+
+       CHECK_IS_BINARY_UPGRADE;
+
+       /* Quick exit if the input is NULL */
+       if (PG_ARGISNULL(0))
+               PG_RETURN_BOOL(false);

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+       ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+       /* XXX: Is PG_TRY/CATCH needed around here? */
+
+       /*
+        * We use silent mode here to decode all changes without
outputting them,
+        * allowing us to detect all the records that could be sent downstream.
+        */

5) I felt these 2 comments can be combined as both are trying to say
the same thing:
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.

6) I feel this memset is not required as we are initializing at the
beginning of function, if you want to keep the memset, the
initialization can be removed:
+       values[2] = CStringGetTextDatum(xlogfilename);
+
+       memset(nulls, 0, sizeof(nulls));
+
+       tuple = heap_form_tuple(tupdesc, values, nulls);

7) looks like a typo, "mu" should be "must":
+       /*
+        * Also, we mu execute pg_log_standby_snapshot() when logical
replication
+        * slots are migrated. Because RUNNING_XACTS record is
required to create
+        * a consistent snapshot.
+        */
+       if (count_old_cluster_logical_slots())
+               create_consistent_snapshot();

8) consitent should be consistent:
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+       DbInfo     *old_db = &old_cluster.dbarr.dbs[0];
+       PGconn     *conn;
+
+       prep_status("Creating a consitent snapshot on new cluster");

Regards,
Vignesh

#309

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Amit Kapila (#306)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Oct 7, 2023 at 3:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)

Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

I am not sure if it is a good idea that a
binary_upgrade_create_logical_replication_slot() API does the logfile
name calculation.

The other problem is that pg_resetwal removes all pre-existing WAL
files which in this case could lead to the removal of the WAL file
corresponding to restart_lsn. This is because at least the shutdown
checkpoint record will be written after the creation of slots which
could be in the new file used for restart_lsn. Then when we invoke
pg_resetwal, it can remove that file.

One idea to deal with this could be to do the reset WAL stuff
(FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
create slots. If we do this, then we additionally need an option in
pg_resetwal which skips resetting the WAL as that would have been done
before creating the slots.

--
With Regards,
Amit Kapila.

#310

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#306)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! PSA new version.

Internally, I added a new decoding mode - DECODING_MODE_SILENT - and

used it.

If the decoding context is in the mode, the output plugin is not loaded, but
any WALs are decoded without skipping.

I think it may be okay not to load the output plugin as we are not
going to process any record in this case but is that the only reason
or you have something else in mind as well?

My main concern was for skipping to set output plugin options. Even if the
pgoutput plugin, some options like protocol_version, publications, etc are
required while loading a plugin. We cannot predict requirements for external
plugins. Based on that I thought output plugins should not be loaded during the
decode.

Also, a new flag "did_process" is also
added. This flag is set if wrappers for output plugin callbacks are called during
the silent mode.

Isn't it sufficient to add a test for silent mode in
begin/stream_start/begin_prepare kind of APIs and set
ctx->did_process? In all other APIs, we can assert that did_process
shouldn't be set and we never reach there when decoding mode is
silent.
+ /* Check whether the meaningful change was found */
+ found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+ ctx->did_process);
Are you talking about this check in the patch? If so, can you please
explain when does the first check help?

I changed around here so I describe once again.

A flag (output_skipped) is set when the transaction is decoded till the end in
silent mode. It is done in DecodeTXNNeedSkip() because the function is the common
path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
true when the decoding context is in the silent mode. Therefore, any cb_wrapper
functions would not be called anymore. DecodingContextHasdecodedItems() just
returns output_skipped.

This approach needs to read WALs till end of transactions before returning the
upgrading function, but codes look simpler than the previous version.

Based on that, I added another binary function

binary_upgrade_create_logical_replication_slot().

This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

I am not sure if it is a good idea that a
binary_upgrade_create_logical_replication_slot() API does the logfile
name calculation.

One consideration is that pg_log_standby_snapshot() must be executed before
slots consuming changes. New cluster does not have RUNNING_XACTS records

so that

decoding context on new cluster cannot be create a consistent snapshot as-is.
This may lead to discard changes during the upcoming consuming event. To
prevent it the function is called after the final pg_resetwal.

How do you think?
+ /*
+ * Also, we mu execute pg_log_standby_snapshot() when logical replication
+ * slots are migrated. Because RUNNING_XACTS record is required to create
+ * a consistent snapshot.
+ */
+ if (count_old_cluster_logical_slots())
+ create_consistent_snapshot();
We shouldn't do this separately. Instead
binary_upgrade_create_logical_replication_slot() should ensure that
corresponding WAL is reserved similar to what we do in
ReplicationSlotReserveWal() and then similarly invoke
LogStandbySnapshot() to ensure that we have enough information to
start.

I did not handle these parts because they needed more analysis. Let's discuss
in later versions.

Few minor comments:
==================
1. The commit message and other comments like atop
get_old_cluster_logical_slot_infos() needs to be adjusted as per
recent changes.

I revisited comments and updated.

2.
@@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
LogicalErrorCallbackState state;
ErrorContextCallback errcallback;
- Assert(!ctx->fast_forward);
+ /*
+ * In silent mode all the two-phase callbacks are not set so that the
+ * wrapper should not be called.
+ */
+ Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
This and other similar comments doesn't seems to be consistent as the
function name and comments are not matching.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 46d2b060421458b9c153cd665f03237f85e18070 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v47] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing a new binary upgrading function,
binary_upgrade_create_logical_replication_slot() on the new cluster. Migration
of logical replication slots is only supported when the old cluster is version
17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, binary_upgrade_create_logical_replication_slot()
sets a startpoint of next wal segment which will be created by pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++-
 src/backend/replication/logical/decode.c      |  43 ++-
 src/backend/replication/logical/logical.c     | 161 +++++++--
 .../replication/logical/logicalfuncs.c        |   2 +-
 src/backend/replication/slot.c                |  12 +
 src/backend/replication/slotfuncs.c           |   7 +-
 src/backend/replication/walsender.c           |   2 +-
 src/backend/utils/adt/pg_upgrade_support.c    | 150 ++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 205 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  31 +-
 src/bin/pg_upgrade/info.c                     | 170 ++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 112 +++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 325 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  12 +
 src/include/replication/logical.h             |  38 +-
 src/include/replication/slot.h                |   4 +
 src/tools/pgindent/typedefs.list              |   3 +
 21 files changed, 1335 insertions(+), 69 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index f17fdb1ba5..4d579e793d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +721,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..6de54153f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				 */
 				if (TransactionIdIsValid(xid))
 				{
-					if (!ctx->fast_forward)
+					if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
 						ReorderBufferAddInvalidations(reorder, xid,
 													  buf->origptr,
 													  invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 					ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
 													  buf->origptr);
 				}
-				else if ((!ctx->fast_forward))
+				else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
 					ReorderBufferImmediateInvalidation(ctx->reorder,
 													   invals->nmsgs,
 													   invals->msgs);
@@ -416,7 +416,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding changes.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	switch (info)
@@ -475,7 +475,7 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding data changes.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	switch (info)
@@ -604,7 +604,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * point in decoding messages.
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+		ctx->decoding_mode == DECODING_MODE_FAST_FORWARD)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +621,20 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * Set output_skipped flag to notify the message's existence to the
+	 * caller side. Usually, the flag is set when either the COMMIT or ABORT
+	 * records are decoded, but this must be turned on here because the
+	 * non-transactional logical message is decoded without waiting for these
+	 * records.
+	 */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT &&
+		!message->transactional)
+	{
+		ctx->output_skipped = true;
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1279,13 +1293,24 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
  *	  are restarting or if we haven't assembled a consistent snapshot yet.
  * 2) The transaction happened in another database.
  * 3) The output plugin is not interested in the origin.
- * 4) We are doing fast-forwarding
+ * 4) We are not in the normal decoding mode.
+ *
+ * Also, set output_skipped flag if we are in the slient mode.
  */
 static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	bool		need_skip;
+
+	need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+				 (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+				 ctx->decoding_mode != DECODING_MODE_NORMAL ||
+				 FilterByOrigin(ctx, origin_id));
+
+	/* Set a flag if we are in the slient mode */
+	if (ctx->decoding_mode == DECODING_MODE_SILENT)
+		ctx->output_skipped = true;
+
+	return need_skip;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..0f4b1c6323 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -41,6 +41,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -150,7 +151,7 @@ StartupDecodingContext(List *output_plugin_options,
 					   XLogRecPtr start_lsn,
 					   TransactionId xmin_horizon,
 					   bool need_full_snapshot,
-					   bool fast_forward,
+					   DecodingMode decoding_mode,
 					   XLogReaderRoutine *xl_routine,
 					   LogicalOutputPluginWriterPrepareWrite prepare_write,
 					   LogicalOutputPluginWriterWrite do_write,
@@ -176,7 +177,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 * (re-)load output plugins, so we detect a bad (removed) output plugin
 	 * now.
 	 */
-	if (!fast_forward)
+	if (decoding_mode == DECODING_MODE_NORMAL)
 		LoadOutputPlugin(&ctx->callbacks, NameStr(slot->data.plugin));
 
 	/*
@@ -294,7 +295,7 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->output_plugin_options = output_plugin_options;
 
-	ctx->fast_forward = fast_forward;
+	ctx->decoding_mode = decoding_mode;
 
 	MemoryContextSwitchTo(old_context);
 
@@ -437,7 +438,7 @@ CreateInitDecodingContext(const char *plugin,
 	ReplicationSlotSave();
 
 	ctx = StartupDecodingContext(NIL, restart_lsn, xmin_horizon,
-								 need_full_snapshot, false,
+								 need_full_snapshot, DECODING_MODE_NORMAL,
 								 xl_routine, prepare_write, do_write,
 								 update_progress);
 
@@ -473,8 +474,8 @@ CreateInitDecodingContext(const char *plugin,
  * output_plugin_options
  *		options passed to the output plugin.
  *
- * fast_forward
- *		bypass the generation of logical changes.
+ * decoding_mode
+ *		See the definition of DecodingMode for details.
  *
  * xl_routine
  *		XLogReaderRoutine used by underlying xlogreader
@@ -493,7 +494,7 @@ CreateInitDecodingContext(const char *plugin,
 LogicalDecodingContext *
 CreateDecodingContext(XLogRecPtr start_lsn,
 					  List *output_plugin_options,
-					  bool fast_forward,
+					  DecodingMode decoding_mode,
 					  XLogReaderRoutine *xl_routine,
 					  LogicalOutputPluginWriterPrepareWrite prepare_write,
 					  LogicalOutputPluginWriterWrite do_write,
@@ -573,8 +574,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 
 	ctx = StartupDecodingContext(output_plugin_options,
 								 start_lsn, InvalidTransactionId, false,
-								 fast_forward, xl_routine, prepare_write,
-								 do_write, update_progress);
+								 decoding_mode, xl_routine,
+								 prepare_write, do_write, update_progress);
 
 	/* call output plugin initialization callback */
 	old_context = MemoryContextSwitchTo(ctx->context);
@@ -773,7 +774,7 @@ startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt, bool i
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -801,7 +802,7 @@ shutdown_cb_wrapper(LogicalDecodingContext *ctx)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -835,7 +836,7 @@ begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -867,7 +868,7 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -905,7 +906,11 @@ begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the two-phase callbacks are
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -950,7 +955,11 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the two-phase callbacks are
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -995,7 +1004,11 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the two-phase callbacks are
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -1041,7 +1054,11 @@ rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the two-phase callbacks are
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when two-phase commits are supported */
 	Assert(ctx->twophase);
@@ -1087,7 +1104,7 @@ change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1126,7 +1143,7 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	if (!ctx->callbacks.truncate_cb)
 		return;
@@ -1168,7 +1185,11 @@ filter_prepare_cb_wrapper(LogicalDecodingContext *ctx, TransactionId xid,
 	ErrorContextCallback errcallback;
 	bool		ret;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, the filter-prepare callback is
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1199,7 +1220,11 @@ filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
 	ErrorContextCallback errcallback;
 	bool		ret;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, the filter-by-origin callback is
+	 * not set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1232,7 +1257,7 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	if (ctx->callbacks.message_cb == NULL)
 		return;
@@ -1268,7 +1293,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1317,7 +1346,11 @@ stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1366,7 +1399,11 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1407,7 +1444,11 @@ stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/*
 	 * We're only supposed to call this when streaming and two-phase commits
@@ -1452,7 +1493,11 @@ stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1493,7 +1538,11 @@ stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1543,7 +1592,11 @@ stream_message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1584,7 +1637,11 @@ stream_truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	/*
+	 * In both silent and fast-forward mode, all the stream callbacks are not
+	 * set, so the wrapper will not be called.
+	 */
+	Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
 
 	/* We're only supposed to call this when streaming is supported. */
 	Assert(ctx->streaming);
@@ -1630,7 +1687,7 @@ update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 	LogicalErrorCallbackState state;
 	ErrorContextCallback errcallback;
 
-	Assert(!ctx->fast_forward);
+	Assert(ctx->decoding_mode != DECODING_MODE_FAST_FORWARD);
 
 	/* Push callback + info on the error context stack */
 	state.ctx = ctx;
@@ -1949,3 +2006,45 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read from the decoding slot and return true when meaningful changes are
+ * processed. Otherwise false.
+ *
+ * Currently, the function is used only for upgrading purposes, but there are
+ * no reasons to restrict it. So, the IsBinaryUpgrade is not checked here.
+ */
+bool
+DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
+							   XLogRecPtr end_of_wal)
+{
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know to point to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!ctx->output_skipped && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	return ctx->output_skipped;
+}
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 197169d6b0..d3f8e22bf6 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -207,7 +207,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									options,
-									false,
+									DECODING_MODE_NORMAL,
 									XL_ROUTINE(.page_read = read_local_xlog_page,
 											   .segment_open = wal_segment_open,
 											   .segment_close = wal_segment_close),
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6035cf4816..89b9d03d1a 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -114,7 +114,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  * When find_startpoint is false, the slot's confirmed_flush is not set; it's
  * caller's responsibility to ensure it's set to something sensible.
  */
-static void
+void
 create_logical_replication_slot(char *name, char *plugin,
 								bool temporary, bool two_phase,
 								XLogRecPtr restart_lsn,
@@ -163,6 +163,9 @@ create_logical_replication_slot(char *name, char *plugin,
 
 /*
  * SQL function for creating a new logical replication slot.
+ *
+ * If you change this function, please see
+ * binary_upgrade_create_logical_replication_slot as well.
  */
 Datum
 pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
@@ -485,7 +488,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
-									true,	/* fast_forward */
+									DECODING_MODE_FAST_FORWARD,
 									XL_ROUTINE(.page_read = read_local_xlog_page,
 											   .segment_open = wal_segment_open,
 											   .segment_close = wal_segment_close),
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e250b0567e..b3b819d996 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1283,7 +1283,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	 * are reported early.
 	 */
 	logical_decoding_ctx =
-		CreateDecodingContext(cmd->startpoint, cmd->options, false,
+		CreateDecodingContext(cmd->startpoint, cmd->options, DECODING_MODE_NORMAL,
 							  XL_ROUTINE(.page_read = logical_read_xlog_page,
 										 .segment_open = WalSndSegmentOpen,
 										 .segment_close = wal_segment_close),
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..9ccf530ed3 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,147 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already been consumed with all the changes.
+ *
+ * Returns true if there are no changes after the confirmed_flush_lsn.
+ * Otherwise false.
+ *
+ * This is a special purpose function to ensure the given slot can be upgraded
+ * without data loss.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	LogicalDecodingContext *ctx = NULL;
+	bool		has_record;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+	else
+		slot_name = PG_GETARG_NAME(0);
+
+	/*
+	 * Acquire the given slot. The error would not be happened because the
+	 * caller has already checked the existence of slot.
+	 */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/*
+	 * It's caller's responsibility to check the health of slot.  Upcoming
+	 * functions assume the restart_lsn points a valid record.
+	 */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	/*
+	 * We use silent mode here to decode all changes without outputting them,
+	 * allowing us to detect all the records that could be sent downstream.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								DECODING_MODE_SILENT,
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	has_record = DecodingContextHasdecodedItems(ctx, end_of_wal);
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!has_record);
+}
+
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * the restart_lsn is set to the startpoint of next wal segment.
+ *
+ * This function returns the slot name, confirmed_flush_lsn, and the filename,
+ * which are pointed by restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+	Name		name;
+	Name		plugin;
+
+	/* Temporary slots is never handled in this function */
+	bool		two_phase;
+	XLogSegNo	xlogsegno;
+	char		xlogfilename[MAXFNAMELEN];
+	XLogRecPtr	restart_lsn;
+
+	Datum		result;
+	TupleDesc	tupdesc;
+	HeapTuple	tuple;
+	Datum		values[3];
+	bool		nulls[3];
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	if (PG_ARGISNULL(0) ||
+		PG_ARGISNULL(1) ||
+		PG_ARGISNULL(2))
+		elog(ERROR,
+			 "null argument to binary_upgrade_create_logical_replication_slot is not allowed");
+
+	CheckSlotPermissions();
+
+	CheckLogicalDecodingRequirements();
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	/* Conditions seem OK, accept arguments */
+	name = PG_GETARG_NAME(0);
+	plugin = PG_GETARG_NAME(1);
+	two_phase = PG_GETARG_BOOL(2);
+
+	/* Calculate the next WAL segment and its LSN */
+	XLByteToPrevSeg(GetFlushRecPtr(NULL), xlogsegno, wal_segment_size);
+	XLogFileName(xlogfilename, (TimeLineID) 1, xlogsegno + 1,
+				 wal_segment_size);
+
+	/* And use the startpoint as restart_lsn */
+	XLogSegNoOffsetToRecPtr(xlogsegno + 1, 0, wal_segment_size, restart_lsn);
+
+	/*
+	 * Create a required logical replication slot. confirmed_flush is the same
+	 * as restart_lsn for now.
+	 */
+	create_logical_replication_slot(NameStr(*name),
+									NameStr(*plugin),
+									false,
+									two_phase,
+									restart_lsn,
+									false);
+
+	MyReplicationSlot->data.confirmed_flush = restart_lsn;
+
+	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+	values[1] = LSNGetDatum(MyReplicationSlot->data.confirmed_flush);
+	values[2] = CStringGetTextDatum(xlogfilename);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	/* ok, slot is now fully created, mark it as persistent */
+	ReplicationSlotPersist();
+	ReplicationSlotRelease();
+
+	PG_RETURN_DATUM(result);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..5d35646481 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
+static void create_consistent_snapshot(void);
 
 
 /*
@@ -89,8 +92,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +113,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +213,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +236,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -245,6 +260,27 @@ report_clusters_compatible(void)
 }
 
 
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+	DbInfo	   *db = &new_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Creating a consistent snapshot on new cluster");
+
+	conn = connectToServer(&new_cluster, db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT pg_log_standby_snapshot();"));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
 issue_warnings_and_set_wal_level(void)
 {
@@ -256,6 +292,14 @@ issue_warnings_and_set_wal_level(void)
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Also, we must execute pg_log_standby_snapshot() when logical replication
+	 * slots are migrated. Because RUNNING_XACTS record is required to create
+	 * a consistent snapshot.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_consistent_snapshot();
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
@@ -1451,3 +1495,158 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('max_replication_slots', 'wal_level') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown. The check has already been done in
+ * get_old_cluster_logical_slot_infos(), so this function reads the result and
+ * reports to the user.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	int			dbnum;
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	/* */
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		int			slotnum;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..c0f5e58fa2 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,8 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		int			slotno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +111,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..755ea3bde7 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,127 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot is caught up if any changes are not found while
+	 * decoding. See binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			slotnum;
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			dbnum;
+	int			slot_count = 0;
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +777,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +798,25 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	int			slotnum;
+
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..03c5f5909f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static char *create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -81,6 +82,8 @@ main(int argc, char **argv)
 {
 	char	   *deletion_script_file_name = NULL;
 	bool		live_check = false;
+	char	   *xlogfilename = NULL;
+	PQExpBufferData resetwal_options;
 
 	/*
 	 * pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -175,6 +178,20 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * The function returns the next wal segment file which must be passed to
+	 * upcoming pg_resetwal command.
+	 */
+
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		xlogfilename = create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	/*
 	 * Assuming OIDs are only used in system tables, there is no need to
 	 * restore the OID counter because we have not transferred any OIDs from
@@ -182,10 +199,23 @@ main(int argc, char **argv)
 	 * because there is no need to have the schema load use new oids.
 	 */
 	prep_status("Setting next OID for new cluster");
+
+	/*
+	 * Construct an option string. If the next wal segment is given, also use
+	 * it.
+	 */
+	initPQExpBuffer(&resetwal_options);
+	appendPQExpBuffer(&resetwal_options, "-o %u \"%s\"",
+					  old_cluster.controldata.chkpnt_nxtoid,
+					  new_cluster.pgdata);
+	if (xlogfilename)
+		appendPQExpBuffer(&resetwal_options, " -l %s", xlogfilename);
+
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
+			  "\"%s/pg_resetwal\" %s",
+			  new_cluster.bindir, resetwal_options.data);
+
+	termPQExpBuffer(&resetwal_options);
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -593,7 +623,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +892,77 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static char *
+create_logical_replication_slots(void)
+{
+	int			dbnum;
+	char	   *xlogfilename = NULL;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		int			slotnum;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			PGresult   *res;
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/*
+			 * Constructs a query for creating logical replication slots. An
+			 * upgrade function is used to set restart_lsn appropriately.
+			 */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.binary_upgrade_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			res = executeQueryOrDie(conn, "%s", query->data);
+
+			Assert(PQntuples(res) == 1 && PQnfields(res) == 3);
+
+			if (xlogfilename == NULL)
+				xlogfilename = pg_strdup(PQgetvalue(res, 0, 2));
+
+			PQclear(res);
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return xlogfilename;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..69b0eb5346 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* Is the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..c90edc12de
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,325 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Consume remained WAL records
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when there are non-transactional changes
+
+# Preparations for the subsequent test:
+# 1. Emit a non-transactional message
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');");
+$old_publisher->stop;
+
+# pg_upgrade will fail because there is a non-transactional change
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The content of the file is same as the previous test
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..eacd91eb67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,18 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
+{ oid => '8047', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_create_logical_replication_slot', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'record',
+  proargtypes => 'name name bool',
+  proallargtypes => '{name,name,bool,name,pg_lsn,text}',
+  proargmodes => '{i,i,i,o,o,o}',
+  prosrc => 'binary_upgrade_create_logical_replication_slot' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..d0f9dda6c5 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -30,6 +30,24 @@ typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct LogicalDecodingC
 														 bool skipped_xact
 );
 
+typedef enum DecodingMode
+{
+	/* Decode and output the changes if needed using the output plugin */
+	DECODING_MODE_NORMAL,
+
+	/*
+	 * Fast-forward decoding mode: Skips loading the output plugin and
+	 * bypasses decoding most changes in a transaction.
+	 */
+	DECODING_MODE_FAST_FORWARD,
+
+	/*
+	 * Silent decoding mode: Skips loading the output plugin and decodes all
+	 * changes without emitting any output.
+	 */
+	DECODING_MODE_SILENT
+} DecodingMode;
+
 typedef struct LogicalDecodingContext
 {
 	/* memory context this is all allocated in */
@@ -44,11 +62,11 @@ typedef struct LogicalDecodingContext
 	struct SnapBuild *snapshot_builder;
 
 	/*
-	 * Marks the logical decoding context as fast forward decoding one. Such a
-	 * context does not have plugin loaded so most of the following properties
-	 * are unused.
+	 * For DECODING_MODE_FAST_FORWARD and DECODING_MODE_SILENT, the context
+	 * does not have plugin loaded so most of the following properties are
+	 * unused.
 	 */
-	bool		fast_forward;
+	DecodingMode decoding_mode;
 
 	OutputPluginCallbacks callbacks;
 	OutputPluginOptions options;
@@ -109,6 +127,13 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/*
+	 * Did the logical decoding context skip outputting any changes?
+	 *
+	 * This flag is used only when the context is in the silent mode.
+	 */
+	bool		output_skipped;
 } LogicalDecodingContext;
 
 
@@ -124,7 +149,7 @@ extern LogicalDecodingContext *CreateInitDecodingContext(const char *plugin,
 														 LogicalOutputPluginWriterUpdateProgress update_progress);
 extern LogicalDecodingContext *CreateDecodingContext(XLogRecPtr start_lsn,
 													 List *output_plugin_options,
-													 bool fast_forward,
+													 DecodingMode decoding_mode,
 													 XLogReaderRoutine *xl_routine,
 													 LogicalOutputPluginWriterPrepareWrite prepare_write,
 													 LogicalOutputPluginWriterWrite do_write,
@@ -145,4 +170,7 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
+										   XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 758ca79a81..6559d3f014 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -227,6 +227,10 @@ extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
+extern void create_logical_replication_slot(char *name, char *plugin,
+											bool temporary, bool two_phase,
+											XLogRecPtr restart_lsn,
+											bool find_startpoint);
 
 /* misc stuff */
 extern void ReplicationSlotInitialize(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de90c4958..b75a69f543 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -562,6 +562,7 @@ DeallocateStmt
 DeclareCursorStmt
 DecodedBkpBlock
 DecodedXLogRecord
+DecodingMode
 DecodingOutputState
 DefElem
 DefElemAction
@@ -1503,6 +1504,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#311

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: vignesh C (#308)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thanks for reviewing! You can available new version in [1]/messages/by-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Few comments:
1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */
+       bool            two_phase = PG_GETARG_BOOL(2);

Yeah, needed. For testing purpose I did not add, but it should have.
Added.

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+       Name            slot_name;
+       XLogRecPtr      end_of_wal;
+       LogicalDecodingContext *ctx = NULL;
+       bool            has_record;
+
+       CHECK_IS_BINARY_UPGRADE;
+
+       /* Quick exit if the input is NULL */
+       if (PG_ARGISNULL(0))
+               PG_RETURN_BOOL(false);

NULL check was added. I felt that we should raise an ERROR.

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */

Added.

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+       ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+       /* XXX: Is PG_TRY/CATCH needed around here? */
+
+       /*
+        * We use silent mode here to decode all changes without
outputting them,
+        * allowing us to detect all the records that could be sent downstream.
+        */

After considering more, it's OK to raise an ERROR because caller can detect it.
Also, there are any setting to be reverted. The comment is removed.

5) I felt these 2 comments can be combined as both are trying to say
the same thing:
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.

Later part was removed.

6) I feel this memset is not required as we are initializing at the
beginning of function, if you want to keep the memset, the
initialization can be removed:
+       values[2] = CStringGetTextDatum(xlogfilename);
+
+       memset(nulls, 0, sizeof(nulls));
+
+       tuple = heap_form_tuple(tupdesc, values, nulls);

The initialization was removed to follow pg_create_logical_replication_slot.

7) looks like a typo, "mu" should be "must":
+       /*
+        * Also, we mu execute pg_log_standby_snapshot() when logical
replication
+        * slots are migrated. Because RUNNING_XACTS record is
required to create
+        * a consistent snapshot.
+        */
+       if (count_old_cluster_logical_slots())
+               create_consistent_snapshot();

Fixed.

8) consitent should be consistent:
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+       DbInfo     *old_db = &old_cluster.dbarr.dbs[0];
+       PGconn     *conn;
+
+       prep_status("Creating a consitent snapshot on new cluster");

Fixed.

[1]: /messages/by-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#312

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Bharath Rupireddy (#303)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thanks for giving comments and apologize for late reply.
New version is available in [1]/messages/by-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

+1 for this approach. It looks neat.

I think we also need to add TAP tests to generate decodable WAL
records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
pg_upgrade as described here
/messages/by-id/TYAPR01MB58660273EACEFC5BF256
B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com.
Basically, these were the exceptional WAL records that may be
generated by pg_upgrade, so having tests for them is good.

Hmm, I'm not sure it is really good. If we add such a test, we may have to add
further tests in future if new WAL log types during upgrade is introduced.
Currently we do not have if-statement for each WAL types, so it does not improve
coverage, I thought. Another concern is that I'm not sure how do we simply and
surely generate XLOG_HEAP2_PRUNE.

Based on above, I did not add the test case for now.

[1]: /messages/by-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#313

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#310)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 10, 2023 at 4:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Isn't it sufficient to add a test for silent mode in
begin/stream_start/begin_prepare kind of APIs and set
ctx->did_process? In all other APIs, we can assert that did_process
shouldn't be set and we never reach there when decoding mode is
silent.
+ /* Check whether the meaningful change was found */
+ found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+ ctx->did_process);
Are you talking about this check in the patch? If so, can you please
explain when does the first check help?
I changed around here so I describe once again.

A flag (output_skipped) is set when the transaction is decoded till the end in
silent mode. It is done in DecodeTXNNeedSkip() because the function is the common
path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
true when the decoding context is in the silent mode. Therefore, any cb_wrapper
functions would not be called anymore. DecodingContextHasdecodedItems() just
returns output_skipped.

This approach needs to read WALs till end of transactions before returning the
upgrading function, but codes look simpler than the previous version.

 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
    Oid txn_dbid, RepOriginId origin_id)
 {
- return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
- (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
- ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+ bool need_skip;
+
+ need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+ ctx->decoding_mode != DECODING_MODE_NORMAL ||
+ FilterByOrigin(ctx, origin_id));
+
+ /* Set a flag if we are in the slient mode */
+ if (ctx->decoding_mode == DECODING_MODE_SILENT)
+ ctx->output_skipped = true;
+
+ return need_skip;

I think you need to set the new flag only when we are not skipping the
transaction or in other words when we decide to process the
transaction. Otherwise, how will you distinguish the case where the
xact is already decoded and sent to client?

--
With Regards,
Amit Kapila

#314

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Amit Kapila (#313)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 10, 2023 at 6:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
Oid txn_dbid, RepOriginId origin_id)
{
- return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
- (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
- ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+ bool need_skip;
+
+ need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+ ctx->decoding_mode != DECODING_MODE_NORMAL ||
+ FilterByOrigin(ctx, origin_id));
+
+ /* Set a flag if we are in the slient mode */
+ if (ctx->decoding_mode == DECODING_MODE_SILENT)
+ ctx->output_skipped = true;
+
+ return need_skip;

In the attached patch atop your v47*, I have changed it to show you
what I have in mind.

A few more comments:
=================
1.
+
+ /*
+ * Did the logical decoding context skip outputting any changes?
+ *
+ * This flag is used only when the context is in the silent mode.
+ */
+ bool output_skipped;
 } LogicalDecodingContext;

This doesn't seem to convey the meaning to the caller. How about
processing_required? BTW, I have made this change as well in the
patch.

2.
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
*/
if (TransactionIdIsValid(xid))
{
- if (!ctx->fast_forward)
+ if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferAddInvalidations(reorder, xid,
  buf->origptr,
  invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
  buf->origptr);
}
- else if ((!ctx->fast_forward))
+ else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferImmediateInvalidation(ctx->reorder,
   invals->nmsgs,
   invals->msgs);

We don't to execute the invalidations even in silent mode. Looking at
this and other changes in the patch related to silent mode, I wonder
whether we really need to introduce 'silent_mode'. Can't we simply set
processing_required when 'fast_forward' mode is true and then let the
caller decide whether it needs to further process the WAL?

--
With Regards,
Amit Kapila.

Attachments:

v47_changes_amit_1.patch.txttext/plain; charset=US-ASCII; name=v47_changes_amit_1.patch.txtDownload

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6de54153f7..f3c561d8ed 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -631,7 +631,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (ctx->decoding_mode == DECODING_MODE_SILENT &&
 		!message->transactional)
 	{
-		ctx->output_skipped = true;
+		ctx->processing_required = true;
 		return;
 	}
 
@@ -1294,8 +1294,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
  * 2) The transaction happened in another database.
  * 3) The output plugin is not interested in the origin.
  * 4) We are not in the normal decoding mode.
- *
- * Also, set output_skipped flag if we are in the slient mode.
  */
 static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
@@ -1308,9 +1306,15 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				 ctx->decoding_mode != DECODING_MODE_NORMAL ||
 				 FilterByOrigin(ctx, origin_id));
 
-	/* Set a flag if we are in the slient mode */
+	if (need_skip)
+		return true;
+
+	/*
+	 * We don't need to process the transaction in silent mode. Indicate the
+	 * same via LogicalDecodingContext, so that the caller can skip processing.
+	 */
 	if (ctx->decoding_mode == DECODING_MODE_SILENT)
-		ctx->output_skipped = true;
+		ctx->processing_required = true;
 
-	return need_skip;
+	return true;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f4b1c6323..e47f2ebd7c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -2030,7 +2030,7 @@ DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
 	InvalidateSystemCaches();
 
 	/* Loop until the end of WAL or some changes are processed */
-	while (!ctx->output_skipped && ctx->reader->EndRecPtr < end_of_wal)
+	while (!ctx->processing_required && ctx->reader->EndRecPtr < end_of_wal)
 	{
 		XLogRecord *record;
 		char	   *errm = NULL;
@@ -2046,5 +2046,5 @@ DecodingContextHasdecodedItems(LogicalDecodingContext *ctx,
 		CHECK_FOR_INTERRUPTS();
 	}
 
-	return ctx->output_skipped;
+	return ctx->processing_required;
 }
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index d0f9dda6c5..94cc631a5b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -128,12 +128,8 @@ typedef struct LogicalDecodingContext
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
 
-	/*
-	 * Did the logical decoding context skip outputting any changes?
-	 *
-	 * This flag is used only when the context is in the silent mode.
-	 */
-	bool		output_skipped;
+	/* Do we need to process any change in silent decoding mode? */
+	bool		processing_required;
 } LogicalDecodingContext;

#315

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#314)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thank you for reviewing! PSA new version.

I think you need to set the new flag only when we are not skipping the
transaction or in other words when we decide to process the
transaction. Otherwise, how will you distinguish the case where the
xact is already decoded and sent to client?

Actually, I wondered what should be, but I followed it. Indeed, we should avoid
the case which the xact has already been sent. But I was not sure other conditions
like transactions for another database - IIUC previous version regarded it as not
acceptable.

Now, I reconsider these cases can be ignored because they would not be sent to
subscriber. The consistency between pub/sub would not be broken even if these
WALs are remained.

In the attached patch atop your v47*, I have changed it to show you
what I have in mind.

Thanks, was included.

A few more comments:
=================
1.
+
+ /*
+ * Did the logical decoding context skip outputting any changes?
+ *
+ * This flag is used only when the context is in the silent mode.
+ */
+ bool output_skipped;
} LogicalDecodingContext;
This doesn't seem to convey the meaning to the caller. How about
processing_required? BTW, I have made this change as well in the
patch.

LGTM, changed like that.

2.
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
*/
if (TransactionIdIsValid(xid))
{
- if (!ctx->fast_forward)
+ if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferAddInvalidations(reorder, xid,
buf->origptr,
invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
buf->origptr);
}
- else if ((!ctx->fast_forward))
+ else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferImmediateInvalidation(ctx->reorder,
invals->nmsgs,
invals->msgs);
We don't to execute the invalidations even in silent mode. Looking at
this and other changes in the patch related to silent mode, I wonder
whether we really need to introduce 'silent_mode'. Can't we simply set
processing_required when 'fast_forward' mode is true and then let the
caller decide whether it needs to further process the WAL?

After considering again, I agreed to remove silent mode. Initially, it was
introduced because did_process flag is set at XXX_cb_wrapper and reorderbuffer
layer. Now, the processing_required is set in DecodeCommit()->DecodeTXNNeedSkip(),
which means that each records does not need to be decoded. Based on that,
I removed the silent mode and use fast-forwarding mode instead.

Also, some parts (mostly code comments) were modified.

Acknowledgement: Thanks Peter and Hou for discussing with me.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v48-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v48-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 059f7749b17604578a9d5e02f1fa8fde7c9c6207 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v48] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing a new binary upgrading function,
binary_upgrade_create_logical_replication_slot() on the new cluster. Migration
of logical replication slots is only supported when the old cluster is version
17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, binary_upgrade_create_logical_replication_slot()
sets a startpoint of next wal segment which will be created by pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++-
 src/backend/replication/logical/decode.c      |  46 ++-
 src/backend/replication/logical/logical.c     |  44 +++
 src/backend/replication/slot.c                |  12 +
 src/backend/replication/slotfuncs.c           |   5 +-
 src/backend/utils/adt/pg_upgrade_support.c    | 154 +++++++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 202 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  38 +-
 src/bin/pg_upgrade/info.c                     | 174 +++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 110 +++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 325 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  12 +
 src/include/replication/logical.h             |  10 +
 src/include/replication/slot.h                |   4 +
 src/tools/pgindent/typedefs.list              |   2 +
 19 files changed, 1224 insertions(+), 41 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index f17fdb1ba5..4d579e793d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +721,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..55fabc429f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,24 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/* Skip decoding if we are just fast-forwarding */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * Also, set processing_required flag if the message is not
+		 * transactional. It is needed to notify the message's existence to
+		 * the caller side. Usually, the flag is set when either the COMMIT or
+		 * ABORT records are decoded, but this must be turned on here because
+		 * the non-transactional logical message is decoded without waiting
+		 * for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1299,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We don't need to process the transaction in fast-forward mode. Indicate
+	 * the same via LogicalDecodingContext, so that the caller can skip
+	 * processing.
+	 */
+	else if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..0d3ee9421d 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -41,6 +41,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1950,46 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read to end of WAL starting from the decoding slot's restart_lsn. Return
+ * true if any meaningful/decodable WAL records are encountered, otherwise
+ * false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+DecodingContextHasDecodedItems(LogicalDecodingContext *ctx,
+							   XLogRecPtr end_of_wal)
+{
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know to point to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!ctx->processing_required && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	return ctx->processing_required;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6035cf4816..1bcdbcb4eb 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -114,7 +114,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  * When find_startpoint is false, the slot's confirmed_flush is not set; it's
  * caller's responsibility to ensure it's set to something sensible.
  */
-static void
+void
 create_logical_replication_slot(char *name, char *plugin,
 								bool temporary, bool two_phase,
 								XLogRecPtr restart_lsn,
@@ -163,6 +163,9 @@ create_logical_replication_slot(char *name, char *plugin,
 
 /*
  * SQL function for creating a new logical replication slot.
+ *
+ * If you change this function, please see
+ * binary_upgrade_create_logical_replication_slot as well.
  */
 Datum
 pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..e8ffb1abff 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,151 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already been consumed with all the changes.
+ *
+ * Returns true if there are no changes after the confirmed_flush_lsn.
+ * Otherwise false.
+ *
+ * This is a special purpose function to ensure the given slot can be upgraded
+ * without data loss.
+ */
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	LogicalDecodingContext *ctx = NULL;
+	bool		found_items;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/*
+	 * Acquire the given slot. There should be no error because the caller has
+	 * already checked the slot exists.
+	 */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/*
+	 * It's caller's responsibility to check the health of the slot.  Upcoming
+	 * functions assume the restart_lsn points to a valid record.
+	 */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from my slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Discover if there are any decodable WAL records beyond the slot's
+	 * confirmed_flush_lsn.
+	 */
+	found_items = DecodingContextHasDecodedItems(ctx, end_of_wal);
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_items);
+}
+
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * the restart_lsn is set to the startpoint of next wal segment. Also,
+ * temporary slots are never handled in this function.
+ *
+ * This function returns the slot name, confirmed_flush_lsn, and the filename,
+ * which are pointed by restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+	Name		name;
+	Name		plugin;
+	bool		two_phase;
+	XLogSegNo	xlogsegno;
+	char		xlogfilename[MAXFNAMELEN];
+	XLogRecPtr	restart_lsn;
+
+	Datum		result;
+	TupleDesc	tupdesc;
+	HeapTuple	tuple;
+	Datum		values[3];
+	bool		nulls[3];
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	if (PG_ARGISNULL(0) ||
+		PG_ARGISNULL(1) ||
+		PG_ARGISNULL(2))
+		elog(ERROR,
+			 "null argument to binary_upgrade_create_logical_replication_slot is not allowed");
+
+	CheckSlotPermissions();
+
+	CheckLogicalDecodingRequirements();
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	/* Conditions seem OK, accept arguments */
+	name = PG_GETARG_NAME(0);
+	plugin = PG_GETARG_NAME(1);
+	two_phase = PG_GETARG_BOOL(2);
+
+	/* Calculate the next WAL segment and its LSN */
+	XLByteToPrevSeg(GetFlushRecPtr(NULL), xlogsegno, wal_segment_size);
+	XLogFileName(xlogfilename, (TimeLineID) 1, xlogsegno + 1,
+				 wal_segment_size);
+
+	/* And use the startpoint as restart_lsn */
+	XLogSegNoOffsetToRecPtr(xlogsegno + 1, 0, wal_segment_size, restart_lsn);
+
+	/*
+	 * Create a required logical replication slot. confirmed_flush is the same
+	 * as restart_lsn for now.
+	 */
+	create_logical_replication_slot(NameStr(*name),
+									NameStr(*plugin),
+									false,
+									two_phase,
+									restart_lsn,
+									false);
+
+	MyReplicationSlot->data.confirmed_flush = restart_lsn;
+
+	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+	values[1] = LSNGetDatum(MyReplicationSlot->data.confirmed_flush);
+	values[2] = CStringGetTextDatum(xlogfilename);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(tupdesc, values, nulls);
+	result = HeapTupleGetDatum(tuple);
+
+	/* ok, slot is now fully created, mark it as persistent */
+	ReplicationSlotPersist();
+	ReplicationSlotRelease();
+
+	PG_RETURN_DATUM(result);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..0d70948208 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,9 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
+static void create_consistent_snapshot(void);
 
 
 /*
@@ -89,8 +92,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +113,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +213,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +236,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -245,6 +260,27 @@ report_clusters_compatible(void)
 }
 
 
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+	DbInfo	   *db = &new_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Creating a consistent snapshot on new cluster");
+
+	conn = connectToServer(&new_cluster, db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT pg_log_standby_snapshot();"));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
 issue_warnings_and_set_wal_level(void)
 {
@@ -256,6 +292,14 @@ issue_warnings_and_set_wal_level(void)
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Also, we must execute pg_log_standby_snapshot() when logical
+	 * replication slots are migrated. Because RUNNING_XACTS record is
+	 * required to create a consistent snapshot.
+	 */
+	if (count_old_cluster_logical_slots())
+		create_consistent_snapshot();
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
@@ -1451,3 +1495,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown. The check has already been done in
+ * get_old_cluster_logical_slot_infos(), so this function reads the result and
+ * reports to the user.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..fc4657c344 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -54,13 +56,13 @@ get_loadable_libraries(void)
 {
 	PGresult  **ress;
 	int			totaltups;
-	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
 
 	/* Fetch all library names, removing duplicates within each DB */
-	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		DbInfo	   *active_db = &old_cluster.dbarr.dbs[dbnum];
 		PGconn	   *conn = connectToServer(&old_cluster, active_db->db_name);
@@ -81,17 +83,22 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
-	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
 		PGresult   *res = ress[dbnum];
 		int			ntups;
-		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
-		for (rowno = 0; rowno < ntups; rowno++)
+		for (int rowno = 0; rowno < ntups; rowno++)
 		{
 			char	   *lib = PQgetvalue(res, rowno, 0);
 
@@ -101,6 +108,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..1f9463443d 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,24 +268,34 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
-	int			dbnum;
-
 	if (cluster->dbarr.dbs != NULL)
 		free_db_and_rel_infos(&cluster->dbarr);
 
 	get_template0_info(cluster);
 	get_db_infos(cluster);
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +612,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot is caught up if any changes are not found while
+	 * decoding. See binary_upgrade_validate_wal_logical_end().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_validate_wal_logical_end(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -638,12 +769,13 @@ free_rel_infos(RelInfoArr *rel_arr)
 static void
 print_db_infos(DbInfoArr *db_arr)
 {
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
+	for (int dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +792,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..d771623d3e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static char *create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -81,6 +82,8 @@ main(int argc, char **argv)
 {
 	char	   *deletion_script_file_name = NULL;
 	bool		live_check = false;
+	char	   *xlogfilename = NULL;
+	PQExpBufferData resetwal_options;
 
 	/*
 	 * pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -175,6 +178,20 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 *
+	 * The function returns the next wal segment file which must be passed to
+	 * upcoming pg_resetwal command.
+	 */
+
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		xlogfilename = create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	/*
 	 * Assuming OIDs are only used in system tables, there is no need to
 	 * restore the OID counter because we have not transferred any OIDs from
@@ -182,10 +199,23 @@ main(int argc, char **argv)
 	 * because there is no need to have the schema load use new oids.
 	 */
 	prep_status("Setting next OID for new cluster");
+
+	/*
+	 * Construct an option string. If the next WAL segment filename is known,
+	 * specify it.
+	 */
+	initPQExpBuffer(&resetwal_options);
+	appendPQExpBuffer(&resetwal_options, "-o %u \"%s\"",
+					  old_cluster.controldata.chkpnt_nxtoid,
+					  new_cluster.pgdata);
+	if (xlogfilename)
+		appendPQExpBuffer(&resetwal_options, " -l %s", xlogfilename);
+
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
+			  "\"%s/pg_resetwal\" %s",
+			  new_cluster.bindir, resetwal_options.data);
+
+	termPQExpBuffer(&resetwal_options);
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -593,7 +623,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +892,75 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static char *
+create_logical_replication_slots(void)
+{
+	char	   *xlogfilename = NULL;
+
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			PGresult   *res;
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/*
+			 * Constructs a query for creating logical replication slots. An
+			 * upgrade function is used to set restart_lsn appropriately.
+			 */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.binary_upgrade_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			res = executeQueryOrDie(conn, "%s", query->data);
+
+			Assert(PQntuples(res) == 1 && PQnfields(res) == 3);
+
+			if (xlogfilename == NULL)
+				xlogfilename = pg_strdup(PQgetvalue(res, 0, 2));
+
+			PQclear(res);
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return xlogfilename;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..fc6bf3020a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,325 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Consume remained WAL records
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when there are non-transactional changes
+
+# Preparations for the subsequent test:
+# 1. Emit a non-transactional message
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');");
+$old_publisher->stop;
+
+# pg_upgrade will fail because there is a non-transactional change
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The content of the file is same as the previous test
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..eacd91eb67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,18 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },
+{ oid => '8047', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_create_logical_replication_slot', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'record',
+  proargtypes => 'name name bool',
+  proallargtypes => '{name,name,bool,name,pg_lsn,text}',
+  proargmodes => '{i,i,i,o,o,o}',
+  prosrc => 'binary_upgrade_create_logical_replication_slot' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..04e05cda1a 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,13 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/*
+	 * Did the logical decoding context require processing WALs?
+	 *
+	 * This flag is used only when the fast_forward mode.
+	 */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +152,7 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool DecodingContextHasDecodedItems(LogicalDecodingContext *ctx,
+										   XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 758ca79a81..6559d3f014 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -227,6 +227,10 @@ extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
+extern void create_logical_replication_slot(char *name, char *plugin,
+											bool temporary, bool two_phase,
+											XLogRecPtr restart_lsn,
+											bool find_startpoint);
 
 /* misc stuff */
 extern void ReplicationSlotInitialize(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de90c4958..ce3731224c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#316

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#315)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Oct 11, 2023 at 4:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.

Some more comments:
1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
First, let's change its name to binary_upgrade_slot_has_pending_wal()
or something like that. Then move the context creation and free
related code into DecodingContextHasDecodedItems(). We can rename
DecodingContextHasDecodedItems() as
pg_logical_replication_slot_has_pending_wal() and place it in
slotfuncs.c. This will make the code structure similar to other slot
functions like pg_replication_slot_advance().

2. + * Returns true if there are no changes after the confirmed_flush_lsn.

How about something like: "Returns true if there are no decodable WAL
records after the confirmed_flush_lsn."?

3. Shouldn't we need to call CheckSlotPermissions() in
binary_upgrade_validate_wal_logical_end?

4.
+ /*
+ * Also, set processing_required flag if the message is not
+ * transactional. It is needed to notify the message's existence to
+ * the caller side. Usually, the flag is set when either the COMMIT or
+ * ABORT records are decoded, but this must be turned on here because
+ * the non-transactional logical message is decoded without waiting
+ * for these records.
+ */

The first sentence of the comments doesn't seem to be required as that
just says what the code does. So, let's slightly change it to: "We
need to set processing_required flag to notify the message's existence
to the caller side. Usually, the flag is set when either the COMMIT or
ABORT records are decoded, but this must be turned on here because the
non-transactional logical message is decoded without waiting for these
records."

--
With Regards,
Amit Kapila.

#317

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#309)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for your suggestion! PSA new version.

The other problem is that pg_resetwal removes all pre-existing WAL
files which in this case could lead to the removal of the WAL file
corresponding to restart_lsn. This is because at least the shutdown
checkpoint record will be written after the creation of slots which
could be in the new file used for restart_lsn. Then when we invoke
pg_resetwal, it can remove that file.

One idea to deal with this could be to do the reset WAL stuff
(FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
create slots. If we do this, then we additionally need an option in
pg_resetwal which skips resetting the WAL as that would have been done
before creating the slots.

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1]/messages/by-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg@mail.gmail.com.

We do not have to reserve future restart_lsn while creating a slot, so the binary
function binary_upgrade_create_logical_replication_slot() was removed.

Another advantage of this approach is to avoid calling pg_log_standby_snapshot()
after the pg_resetwal. This was needed because of two reasons, but they were
resolved automatically.
1) pg_resetwal removes all WAL files.
2) Logical slots requires a RUNNING_XACTS record for building a snapshot.

[1]: /messages/by-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 205b8dbfe96f1b9806036586729bd050008d9649 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v49] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing a new binary upgrading function,
binary_upgrade_create_logical_replication_slot() on the new cluster. Migration
of logical replication slots is only supported when the old cluster is version
17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, binary_upgrade_create_logical_replication_slot()
sets a startpoint of next WAL segment which will be created by pg_resetwal.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  12 +
 src/backend/replication/slotfuncs.c           |   2 +-
 src/backend/utils/adt/pg_upgrade_support.c    |  53 +++
 src/bin/pg_resetwal/pg_resetwal.c             | 307 ++--------------
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 172 ++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               | 104 +++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 325 +++++++++++++++++
 src/fe_utils/Makefile                         |   3 +-
 src/fe_utils/meson.build                      |   1 +
 src/fe_utils/wal.c                            | 329 ++++++++++++++++++
 src/include/access/xlog.h                     |   8 -
 src/include/access/xlogdefs.h                 |   8 +
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/fe_utils/wal.h                    |  29 ++
 src/include/replication/logical.h             |   9 +
 src/include/replication/slot.h                |   4 +
 src/tools/pgindent/typedefs.list              |   2 +
 26 files changed, 1487 insertions(+), 322 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
 create mode 100644 src/fe_utils/wal.c
 create mode 100644 src/include/fe_utils/wal.h

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index f17fdb1ba5..4d579e793d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +721,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..4144a43afd 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We can also skip decoding when in 'fast_forward' mode. This check must
+	 * be last because we don't want to set that processing_required flag
+	 * unnecessarily.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller side. Usually, the flag is set when either
+		 * the COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We can also skip decoding when in 'fast_forward' mode. In passing set
+	 * the 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..9f4c347c32 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read to end of WAL starting from the decoding slot's restart_lsn. Return
+ * true if any meaningful/decodable WAL records are encountered, otherwise
+ * false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+pg_logical_replication_slot_has_pending_wal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx = NULL;
+	bool		processing_required = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!processing_required && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		processing_required = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return processing_required;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6035cf4816..92eb1fbed0 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -114,7 +114,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  * When find_startpoint is false, the slot's confirmed_flush is not set; it's
  * caller's responsibility to ensure it's set to something sensible.
  */
-static void
+void
 create_logical_replication_slot(char *name, char *plugin,
 								bool temporary, bool two_phase,
 								XLogRecPtr restart_lsn,
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..8a53b7d41e 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,50 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure the given slot can be upgraded
+ * without data loss.
+ */
+Datum
+binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_items;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/*
+	 * Acquire the given slot. There should be no error because the caller has
+	 * already checked the slot exists.
+	 */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/*
+	 * It's caller's responsibility to check the health of the slot.  Upcoming
+	 * functions assume the restart_lsn points to a valid record.
+	 */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_items = pg_logical_replication_slot_has_pending_wal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_items);
+}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 04567f349d..c9a5200ea4 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -55,6 +55,7 @@
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/option_utils.h"
+#include "fe_utils/wal.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/large_object.h"
@@ -81,11 +82,6 @@ static bool read_controlfile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
 static void PrintNewControlValues(void);
-static void RewriteControlFile(void);
-static void FindEndOfXLOG(void);
-static void KillExistingXLOG(void);
-static void KillExistingArchiveStatus(void);
-static void WriteEmptyXLOG(void);
 static void usage(void);
 
 
@@ -105,6 +101,7 @@ main(int argc, char *argv[])
 		{"oldest-transaction-id", required_argument, NULL, 'u'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
 		{"wal-segsize", required_argument, NULL, 1},
+		{"no-switch", no_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -118,6 +115,8 @@ main(int argc, char *argv[])
 	char	   *log_fname = NULL;
 	int			fd;
 
+	bool		noswitch = false;
+
 	pg_logging_init(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_resetwal"));
 	progname = get_progname(argv[0]);
@@ -153,6 +152,10 @@ main(int argc, char *argv[])
 				noupdate = true;
 				break;
 
+			case 2:
+				noswitch = true;
+				break;
+
 			case 'e':
 				errno = 0;
 				set_xid_epoch = strtoul(optarg, &endptr, 0);
@@ -396,7 +399,9 @@ main(int argc, char *argv[])
 	/*
 	 * Also look at existing segment files to set up newXlogSegNo
 	 */
-	FindEndOfXLOG();
+	newXlogSegNo = FindEndOfXLOG(ControlFile.checkPointCopy.redo,
+								 ControlFile.xlog_seg_size, WalSegSz,
+								 ".");
 
 	/*
 	 * If we're not going to proceed with the reset, print the current control
@@ -490,12 +495,18 @@ main(int argc, char *argv[])
 	/*
 	 * Else, do the dirty deed.
 	 */
-	RewriteControlFile();
-	KillExistingXLOG();
-	KillExistingArchiveStatus();
-	WriteEmptyXLOG();
+	RewriteControlFile(noswitch, newXlogSegNo, WalSegSz, &ControlFile,
+					   ".");
+
+	if (!noswitch)
+	{
+		KillExistingXLOG(".");
+		KillExistingArchiveStatus(".");
+		WriteEmptyXLOG(newXlogSegNo, WalSegSz, &ControlFile, ".");
+
+		printf(_("Write-ahead log reset\n"));
+	}
 
-	printf(_("Write-ahead log reset\n"));
 	return 0;
 }
 
@@ -852,280 +863,6 @@ PrintNewControlValues(void)
 }
 
 
-/*
- * Write out the new pg_control file.
- */
-static void
-RewriteControlFile(void)
-{
-	/*
-	 * Adjust fields as needed to force an empty XLOG starting at
-	 * newXlogSegNo.
-	 */
-	XLogSegNoOffsetToRecPtr(newXlogSegNo, SizeOfXLogLongPHD, WalSegSz,
-							ControlFile.checkPointCopy.redo);
-	ControlFile.checkPointCopy.time = (pg_time_t) time(NULL);
-
-	ControlFile.state = DB_SHUTDOWNED;
-	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
-	ControlFile.minRecoveryPoint = 0;
-	ControlFile.minRecoveryPointTLI = 0;
-	ControlFile.backupStartPoint = 0;
-	ControlFile.backupEndPoint = 0;
-	ControlFile.backupEndRequired = false;
-
-	/*
-	 * Force the defaults for max_* settings. The values don't really matter
-	 * as long as wal_level='minimal'; the postmaster will reset these fields
-	 * anyway at startup.
-	 */
-	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
-	ControlFile.wal_log_hints = false;
-	ControlFile.track_commit_timestamp = false;
-	ControlFile.MaxConnections = 100;
-	ControlFile.max_wal_senders = 10;
-	ControlFile.max_worker_processes = 8;
-	ControlFile.max_prepared_xacts = 0;
-	ControlFile.max_locks_per_xact = 64;
-
-	/* The control file gets flushed here. */
-	update_controlfile(".", &ControlFile, true);
-}
-
-
-/*
- * Scan existing XLOG files and determine the highest existing WAL address
- *
- * On entry, ControlFile.checkPointCopy.redo and ControlFile.xlog_seg_size
- * are assumed valid (note that we allow the old xlog seg size to differ
- * from what we're using).  On exit, newXlogSegNo is set to suitable
- * value for the beginning of replacement WAL (in our seg size).
- */
-static void
-FindEndOfXLOG(void)
-{
-	DIR		   *xldir;
-	struct dirent *xlde;
-	uint64		xlogbytepos;
-
-	/*
-	 * Initialize the max() computation using the last checkpoint address from
-	 * old pg_control.  Note that for the moment we are working with segment
-	 * numbering according to the old xlog seg size.
-	 */
-	XLByteToSeg(ControlFile.checkPointCopy.redo, newXlogSegNo,
-				ControlFile.xlog_seg_size);
-
-	/*
-	 * Scan the pg_wal directory to find existing WAL segment files. We assume
-	 * any present have been used; in most scenarios this should be
-	 * conservative, because of xlog.c's attempts to pre-create files.
-	 */
-	xldir = opendir(XLOGDIR);
-	if (xldir == NULL)
-		pg_fatal("could not open directory \"%s\": %m", XLOGDIR);
-
-	while (errno = 0, (xlde = readdir(xldir)) != NULL)
-	{
-		if (IsXLogFileName(xlde->d_name) ||
-			IsPartialXLogFileName(xlde->d_name))
-		{
-			TimeLineID	tli;
-			XLogSegNo	segno;
-
-			/* Use the segment size from the control file */
-			XLogFromFileName(xlde->d_name, &tli, &segno,
-							 ControlFile.xlog_seg_size);
-
-			/*
-			 * Note: we take the max of all files found, regardless of their
-			 * timelines.  Another possibility would be to ignore files of
-			 * timelines other than the target TLI, but this seems safer.
-			 * Better too large a result than too small...
-			 */
-			if (segno > newXlogSegNo)
-				newXlogSegNo = segno;
-		}
-	}
-
-	if (errno)
-		pg_fatal("could not read directory \"%s\": %m", XLOGDIR);
-
-	if (closedir(xldir))
-		pg_fatal("could not close directory \"%s\": %m", XLOGDIR);
-
-	/*
-	 * Finally, convert to new xlog seg size, and advance by one to ensure we
-	 * are in virgin territory.
-	 */
-	xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size;
-	newXlogSegNo = (xlogbytepos + ControlFile.xlog_seg_size - 1) / WalSegSz;
-	newXlogSegNo++;
-}
-
-
-/*
- * Remove existing XLOG files
- */
-static void
-KillExistingXLOG(void)
-{
-	DIR		   *xldir;
-	struct dirent *xlde;
-	char		path[MAXPGPATH + sizeof(XLOGDIR)];
-
-	xldir = opendir(XLOGDIR);
-	if (xldir == NULL)
-		pg_fatal("could not open directory \"%s\": %m", XLOGDIR);
-
-	while (errno = 0, (xlde = readdir(xldir)) != NULL)
-	{
-		if (IsXLogFileName(xlde->d_name) ||
-			IsPartialXLogFileName(xlde->d_name))
-		{
-			snprintf(path, sizeof(path), "%s/%s", XLOGDIR, xlde->d_name);
-			if (unlink(path) < 0)
-				pg_fatal("could not delete file \"%s\": %m", path);
-		}
-	}
-
-	if (errno)
-		pg_fatal("could not read directory \"%s\": %m", XLOGDIR);
-
-	if (closedir(xldir))
-		pg_fatal("could not close directory \"%s\": %m", XLOGDIR);
-}
-
-
-/*
- * Remove existing archive status files
- */
-static void
-KillExistingArchiveStatus(void)
-{
-#define ARCHSTATDIR XLOGDIR "/archive_status"
-
-	DIR		   *xldir;
-	struct dirent *xlde;
-	char		path[MAXPGPATH + sizeof(ARCHSTATDIR)];
-
-	xldir = opendir(ARCHSTATDIR);
-	if (xldir == NULL)
-		pg_fatal("could not open directory \"%s\": %m", ARCHSTATDIR);
-
-	while (errno = 0, (xlde = readdir(xldir)) != NULL)
-	{
-		if (strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_FNAME_LEN &&
-			(strcmp(xlde->d_name + XLOG_FNAME_LEN, ".ready") == 0 ||
-			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".done") == 0 ||
-			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".partial.ready") == 0 ||
-			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".partial.done") == 0))
-		{
-			snprintf(path, sizeof(path), "%s/%s", ARCHSTATDIR, xlde->d_name);
-			if (unlink(path) < 0)
-				pg_fatal("could not delete file \"%s\": %m", path);
-		}
-	}
-
-	if (errno)
-		pg_fatal("could not read directory \"%s\": %m", ARCHSTATDIR);
-
-	if (closedir(xldir))
-		pg_fatal("could not close directory \"%s\": %m", ARCHSTATDIR);
-}
-
-
-/*
- * Write an empty XLOG file, containing only the checkpoint record
- * already set up in ControlFile.
- */
-static void
-WriteEmptyXLOG(void)
-{
-	PGAlignedXLogBlock buffer;
-	XLogPageHeader page;
-	XLogLongPageHeader longpage;
-	XLogRecord *record;
-	pg_crc32c	crc;
-	char		path[MAXPGPATH];
-	int			fd;
-	int			nbytes;
-	char	   *recptr;
-
-	memset(buffer.data, 0, XLOG_BLCKSZ);
-
-	/* Set up the XLOG page header */
-	page = (XLogPageHeader) buffer.data;
-	page->xlp_magic = XLOG_PAGE_MAGIC;
-	page->xlp_info = XLP_LONG_HEADER;
-	page->xlp_tli = ControlFile.checkPointCopy.ThisTimeLineID;
-	page->xlp_pageaddr = ControlFile.checkPointCopy.redo - SizeOfXLogLongPHD;
-	longpage = (XLogLongPageHeader) page;
-	longpage->xlp_sysid = ControlFile.system_identifier;
-	longpage->xlp_seg_size = WalSegSz;
-	longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
-
-	/* Insert the initial checkpoint record */
-	recptr = (char *) page + SizeOfXLogLongPHD;
-	record = (XLogRecord *) recptr;
-	record->xl_prev = 0;
-	record->xl_xid = InvalidTransactionId;
-	record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint);
-	record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
-	record->xl_rmid = RM_XLOG_ID;
-
-	recptr += SizeOfXLogRecord;
-	*(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
-	*(recptr++) = sizeof(CheckPoint);
-	memcpy(recptr, &ControlFile.checkPointCopy,
-		   sizeof(CheckPoint));
-
-	INIT_CRC32C(crc);
-	COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
-	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
-	FIN_CRC32C(crc);
-	record->xl_crc = crc;
-
-	/* Write the first page */
-	XLogFilePath(path, ControlFile.checkPointCopy.ThisTimeLineID,
-				 newXlogSegNo, WalSegSz);
-
-	unlink(path);
-
-	fd = open(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
-			  pg_file_create_mode);
-	if (fd < 0)
-		pg_fatal("could not open file \"%s\": %m", path);
-
-	errno = 0;
-	if (write(fd, buffer.data, XLOG_BLCKSZ) != XLOG_BLCKSZ)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		pg_fatal("could not write file \"%s\": %m", path);
-	}
-
-	/* Fill the rest of the file with zeroes */
-	memset(buffer.data, 0, XLOG_BLCKSZ);
-	for (nbytes = XLOG_BLCKSZ; nbytes < WalSegSz; nbytes += XLOG_BLCKSZ)
-	{
-		errno = 0;
-		if (write(fd, buffer.data, XLOG_BLCKSZ) != XLOG_BLCKSZ)
-		{
-			if (errno == 0)
-				errno = ENOSPC;
-			pg_fatal("could not write file \"%s\": %m", path);
-		}
-	}
-
-	if (fsync(fd) != 0)
-		pg_fatal("fsync error: %m");
-
-	close(fd);
-}
-
-
 static void
 usage(void)
 {
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..123f47a81f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown. The check has already been done in
+ * get_old_cluster_logical_slot_infos(), so this function reads the result and
+ * reports to the user.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..c56769fe54 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot is caught up if any changes are not found while
+	 * decoding. See binary_upgrade_slot_has_pending_wal().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_pending_wal(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..cf7d6383ec 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -45,10 +45,12 @@
 #endif
 
 #include "catalog/pg_class_d.h"
+#include "common/controldata_utils.h"
 #include "common/file_perm.h"
 #include "common/logging.h"
 #include "common/restricted_token.h"
 #include "fe_utils/string_utils.h"
+#include "fe_utils/wal.h"
 #include "pg_upgrade.h"
 
 static void set_locale_and_encoding(void);
@@ -59,6 +61,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -81,6 +84,9 @@ main(int argc, char **argv)
 {
 	char	   *deletion_script_file_name = NULL;
 	bool		live_check = false;
+	XLogSegNo	newXlogSegNo;
+	bool		crc_ok;
+	ControlFileData *controlfile;
 
 	/*
 	 * pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -175,15 +181,49 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
+	prep_status("Resetting WALs and a control file");
+
+	controlfile = get_controlfile(new_cluster.pgdata, &crc_ok);
+	if (!crc_ok)
+		pg_fatal("pg_control CRC value is incorrect");
+
+	/* Look at existing segment files to set up newXlogSegNo */
+	newXlogSegNo = FindEndOfXLOG(controlfile->checkPointCopy.redo,
+								 controlfile->xlog_seg_size,
+								 controlfile->xlog_seg_size,
+								 new_cluster.pgdata);
+
+	RewriteControlFile(false, newXlogSegNo, controlfile->xlog_seg_size,
+					   controlfile, new_cluster.pgdata);
+	KillExistingXLOG(new_cluster.pgdata);
+	KillExistingArchiveStatus(new_cluster.pgdata);
+	WriteEmptyXLOG(newXlogSegNo, controlfile->xlog_seg_size,
+				   controlfile, new_cluster.pgdata);
+
+	check_ok();
+
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	/*
 	 * Assuming OIDs are only used in system tables, there is no need to
 	 * restore the OID counter because we have not transferred any OIDs from
 	 * the old system, but we do it anyway just in case.  We do it late here
 	 * because there is no need to have the schema load use new oids.
+	 *
+	 * Since we have already cleaned up the WAL log, there is no need to do it
+	 * again, so, we pass the --no-switch option.
 	 */
 	prep_status("Setting next OID for new cluster");
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
+			  "\"%s/pg_resetwal\" -o %u \"%s\" --no-switch",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
 			  new_cluster.pgdata);
 	check_ok();
@@ -593,7 +633,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +902,63 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..fc6bf3020a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,325 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Consume remained WAL records
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when there are non-transactional changes
+
+# Preparations for the subsequent test:
+# 1. Emit a non-transactional message
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');");
+$old_publisher->stop;
+
+# pg_upgrade will fail because there is a non-transactional change
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The content of the file is same as the previous test
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 456d6dd390..0c49e12a5d 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -32,7 +32,8 @@ OBJS = \
 	query_utils.o \
 	recovery_gen.o \
 	simple_list.o \
-	string_utils.o
+	string_utils.o \
+	wal.o
 
 ifeq ($(PORTNAME), win32)
 override CPPFLAGS += -DFD_SETSIZE=1024
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index ea96e862ad..c17cfa791d 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -13,6 +13,7 @@ fe_utils_sources = files(
   'recovery_gen.c',
   'simple_list.c',
   'string_utils.c',
+  'wal.c',
 )
 
 psqlscan = custom_target('psqlscan',
diff --git a/src/fe_utils/wal.c b/src/fe_utils/wal.c
new file mode 100644
index 0000000000..fb186f800f
--- /dev/null
+++ b/src/fe_utils/wal.c
@@ -0,0 +1,329 @@
+/*-------------------------------------------------------------------------
+ *
+ * wal.c
+ *	  Routines to access WAL log from frontend
+ *
+ * Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/fe_utils/wal.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <dirent.h>
+#include <unistd.h>
+#include <time.h>
+
+#include "access/transam.h"
+#include "access/xlog_internal.h"
+#include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+#include "fe_utils/wal.h"
+
+/*
+ * Scan existing XLOG files and determine the highest existing WAL address
+ *
+ * On entry, lastRecPtr, oldXlogSegSize and newXlogSegSize are assumed valid
+ * (note that we allow the old xlog seg size to differ from what we're using).
+ * On exit, return suitable value for the beginning of replacement WAL (based
+ * on the new seg size).
+ */
+XLogRecPtr
+FindEndOfXLOG(XLogRecPtr lastRecPtr, uint32 oldXlogSegSize,
+			  uint32 newXlogSegSize, const char *pgdata)
+{
+	DIR		   *xldir;
+	struct dirent *xlde;
+	XLogSegNo	newXlogSegNo;
+	XLogRecPtr	xlogbytepos;
+	char		walpath[MAXPGPATH + sizeof(XLOGDIR)];
+
+	snprintf(walpath, sizeof(walpath), "%s/%s", pgdata, XLOGDIR);
+
+	/*
+	 * Initialize the max() computation using the lastRecPtr.  Note that for
+	 * the moment we are working with segment numbering according to the old
+	 * xlog seg size.
+	 */
+	XLByteToSeg(lastRecPtr, newXlogSegNo, oldXlogSegSize);
+
+	/*
+	 * Scan the pg_wal directory to find existing WAL segment files. We assume
+	 * any present have been used; in most scenarios this should be
+	 * conservative, because of xlog.c's attempts to pre-create files.
+	 */
+	xldir = opendir(walpath);
+	if (xldir == NULL)
+		pg_fatal("could not open directory \"%s\": %m", walpath);
+
+	while (errno = 0, (xlde = readdir(xldir)) != NULL)
+	{
+		if (IsXLogFileName(xlde->d_name) ||
+			IsPartialXLogFileName(xlde->d_name))
+		{
+			TimeLineID	tli;
+			XLogSegNo	segno;
+
+			/* Use the segment size from the control file */
+			XLogFromFileName(xlde->d_name, &tli, &segno,
+							 oldXlogSegSize);
+
+			/*
+			 * Note: we take the max of all files found, regardless of their
+			 * timelines.  Another possibility would be to ignore files of
+			 * timelines other than the target TLI, but this seems safer.
+			 * Better too large a result than too small...
+			 */
+			if (segno > newXlogSegNo)
+				newXlogSegNo = segno;
+		}
+	}
+
+	if (errno)
+		pg_fatal("could not read directory \"%s\": %m", walpath);
+
+	if (closedir(xldir))
+		pg_fatal("could not close directory \"%s\": %m", walpath);
+
+	/*
+	 * Finally, convert to new xlog seg size, and advance by one to ensure we
+	 * are in virgin territory.
+	 */
+	xlogbytepos = newXlogSegNo * oldXlogSegSize;
+	newXlogSegNo = (xlogbytepos + oldXlogSegSize - 1) / newXlogSegSize;
+	newXlogSegNo++;
+
+	return newXlogSegNo;
+}
+
+
+/*
+ * Remove existing XLOG files
+ */
+void
+KillExistingXLOG(const char *pgdata)
+{
+	DIR		   *xldir;
+	struct dirent *xlde;
+	char		walpath[MAXPGPATH + sizeof(XLOGDIR)];
+	char		path[MAXPGPATH + sizeof(XLOGDIR)];
+
+	snprintf(walpath, sizeof(walpath), "%s/%s", pgdata, XLOGDIR);
+
+	xldir = opendir(walpath);
+	if (xldir == NULL)
+		pg_fatal("could not open directory \"%s\": %m", walpath);
+
+	while (errno = 0, (xlde = readdir(xldir)) != NULL)
+	{
+		if (IsXLogFileName(xlde->d_name) ||
+			IsPartialXLogFileName(xlde->d_name))
+		{
+			snprintf(path, sizeof(path), "%s/%s", walpath, xlde->d_name);
+			if (unlink(path) < 0)
+				pg_fatal("could not delete file \"%s\": %m", path);
+		}
+	}
+
+	if (errno)
+		pg_fatal("could not read directory \"%s\": %m", walpath);
+
+	if (closedir(xldir))
+		pg_fatal("could not close directory \"%s\": %m", walpath);
+}
+
+
+/*
+ * Remove existing archive status files
+ */
+void
+KillExistingArchiveStatus(const char *pgdata)
+{
+#define ARCHSTATDIR XLOGDIR "/archive_status"
+
+	DIR		   *xldir;
+	struct dirent *xlde;
+	char		archpath[MAXPGPATH + sizeof(ARCHSTATDIR)];
+	char		path[MAXPGPATH + sizeof(ARCHSTATDIR)];
+
+	snprintf(archpath, sizeof(archpath), "%s/%s", pgdata, ARCHSTATDIR);
+
+	xldir = opendir(archpath);
+	if (xldir == NULL)
+		pg_fatal("could not open directory \"%s\": %m", archpath);
+
+	while (errno = 0, (xlde = readdir(xldir)) != NULL)
+	{
+		if (strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_FNAME_LEN &&
+			(strcmp(xlde->d_name + XLOG_FNAME_LEN, ".ready") == 0 ||
+			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".done") == 0 ||
+			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".partial.ready") == 0 ||
+			 strcmp(xlde->d_name + XLOG_FNAME_LEN, ".partial.done") == 0))
+		{
+			snprintf(path, sizeof(path), "%s/%s", archpath, xlde->d_name);
+			if (unlink(path) < 0)
+				pg_fatal("could not delete file \"%s\": %m", path);
+		}
+	}
+
+	if (errno)
+		pg_fatal("could not read directory \"%s\": %m", archpath);
+
+	if (closedir(xldir))
+		pg_fatal("could not close directory \"%s\": %m", archpath);
+}
+
+
+/*
+ * Write an empty XLOG file, containing only the checkpoint record
+ * already set up in ControlFile.
+ */
+void
+WriteEmptyXLOG(XLogSegNo newXlogSegNo, uint32 walSegSz,
+			   ControlFileData *controlFile, const char *pgdata)
+{
+	PGAlignedXLogBlock buffer;
+	XLogPageHeader page;
+	XLogLongPageHeader longpage;
+	XLogRecord *record;
+	pg_crc32c	crc;
+	char		walpath[MAXPGPATH];
+	char		path[MAXPGPATH];
+	int			fd;
+	int			nbytes;
+	char	   *recptr;
+
+	memset(buffer.data, 0, XLOG_BLCKSZ);
+
+	/* Set up the XLOG page header */
+	page = (XLogPageHeader) buffer.data;
+	page->xlp_magic = XLOG_PAGE_MAGIC;
+	page->xlp_info = XLP_LONG_HEADER;
+	page->xlp_tli = controlFile->checkPointCopy.ThisTimeLineID;
+	page->xlp_pageaddr = controlFile->checkPointCopy.redo - SizeOfXLogLongPHD;
+	longpage = (XLogLongPageHeader) page;
+	longpage->xlp_sysid = controlFile->system_identifier;
+	longpage->xlp_seg_size = walSegSz;
+	longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
+
+	/* Insert the initial checkpoint record */
+	recptr = (char *) page + SizeOfXLogLongPHD;
+	record = (XLogRecord *) recptr;
+	record->xl_prev = 0;
+	record->xl_xid = InvalidTransactionId;
+	record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint);
+	record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
+	record->xl_rmid = RM_XLOG_ID;
+
+	recptr += SizeOfXLogRecord;
+	*(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
+	*(recptr++) = sizeof(CheckPoint);
+	memcpy(recptr, &controlFile->checkPointCopy,
+		   sizeof(CheckPoint));
+
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
+	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
+	FIN_CRC32C(crc);
+	record->xl_crc = crc;
+
+	/* Write the first page */
+	XLogFilePath(path, controlFile->checkPointCopy.ThisTimeLineID,
+				 newXlogSegNo, walSegSz);
+
+	snprintf(walpath, sizeof(walpath), "%s/%s", pgdata, path);
+
+	unlink(walpath);
+
+	fd = open(walpath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+			  pg_file_create_mode);
+	if (fd < 0)
+		pg_fatal("could not open file \"%s\": %m", walpath);
+
+	errno = 0;
+	if (write(fd, buffer.data, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		pg_fatal("could not write file \"%s\": %m", walpath);
+	}
+
+	/* Fill the rest of the file with zeroes */
+	memset(buffer.data, 0, XLOG_BLCKSZ);
+	for (nbytes = XLOG_BLCKSZ; nbytes < walSegSz; nbytes += XLOG_BLCKSZ)
+	{
+		errno = 0;
+		if (write(fd, buffer.data, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+		{
+			if (errno == 0)
+				errno = ENOSPC;
+			pg_fatal("could not write file \"%s\": %m", walpath);
+		}
+	}
+
+	if (fsync(fd) != 0)
+		pg_fatal("fsync error: %m");
+
+	close(fd);
+}
+
+/*
+ * Write out the new pg_control file.
+ *
+ * No-switch mode means that WALs would not be discarded by subsequent
+ * operations. In this case, attributes related to the WAL location should not
+ * be initialized.
+ */
+void
+RewriteControlFile(bool noswith, XLogSegNo newXlogSegNo,
+				   uint32 walSegSz, ControlFileData *controlFile,
+				   char *DataDir)
+{
+	controlFile->checkPointCopy.time = (pg_time_t) time(NULL);
+	controlFile->state = DB_SHUTDOWNED;
+
+	/*
+	 * Skip updating checkpoint locaiton and something when we are in
+	 * no-switch mode.
+	 */
+	if (!noswith)
+	{
+		/*
+		 * Adjust fields as needed to force an empty XLOG starting at
+		 * newXlogSegNo.
+		 */
+		XLogSegNoOffsetToRecPtr(newXlogSegNo, SizeOfXLogLongPHD, walSegSz,
+								controlFile->checkPointCopy.redo);
+
+		controlFile->checkPoint = controlFile->checkPointCopy.redo;
+
+		controlFile->minRecoveryPoint = 0;
+		controlFile->minRecoveryPointTLI = 0;
+		controlFile->backupStartPoint = 0;
+		controlFile->backupEndPoint = 0;
+		controlFile->backupEndRequired = false;
+	}
+
+	/*
+	 * Force the defaults for max_* settings. The values don't really matter
+	 * as long as wal_level='minimal'; the postmaster will reset these fields
+	 * anyway at startup.
+	 */
+	controlFile->wal_level = WAL_LEVEL_MINIMAL;
+	controlFile->wal_log_hints = false;
+	controlFile->track_commit_timestamp = false;
+	controlFile->MaxConnections = 100;
+	controlFile->max_wal_senders = 10;
+	controlFile->max_worker_processes = 8;
+	controlFile->max_prepared_xacts = 0;
+	controlFile->max_locks_per_xact = 64;
+
+	/* The control file gets flushed here. */
+	update_controlfile(DataDir, controlFile, true);
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 48ca852381..78ef2fd5a6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -63,14 +63,6 @@ typedef enum ArchiveMode
 } ArchiveMode;
 extern PGDLLIMPORT int XLogArchiveMode;
 
-/* WAL levels */
-typedef enum WalLevel
-{
-	WAL_LEVEL_MINIMAL = 0,
-	WAL_LEVEL_REPLICA,
-	WAL_LEVEL_LOGICAL
-} WalLevel;
-
 /* Compression algorithms for WAL */
 typedef enum WalCompression
 {
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index fe794c7740..a364b416f1 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -14,6 +14,14 @@
 
 #include <fcntl.h>				/* need open() flags */
 
+/* WAL levels */
+typedef enum WalLevel
+{
+	WAL_LEVEL_MINIMAL = 0,
+	WAL_LEVEL_REPLICA,
+	WAL_LEVEL_LOGICAL
+} WalLevel;
+
 /*
  * Pointer to a location in the XLOG.  These pointers are 64 bits wide,
  * because we don't want them ever to overflow.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..bc9aa9bd1b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11370,6 +11370,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_pending_wal', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_pending_wal' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/fe_utils/wal.h b/src/include/fe_utils/wal.h
new file mode 100644
index 0000000000..511ccc896d
--- /dev/null
+++ b/src/include/fe_utils/wal.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * wal.h
+ *	  Routines to access WAL log from frontend
+ *
+ * Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * src/include/fe_utils/wal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FE_WAL_H
+#define FE_WAL_H
+
+#include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
+
+extern XLogRecPtr FindEndOfXLOG(XLogRecPtr lastRecPtr, uint32 oldXlogSegSize,
+								uint32 newXlogSegSize, const char *pgdata);
+extern void KillExistingXLOG(const char *pgdata);
+extern void KillExistingArchiveStatus(const char *pgdata);
+extern void WriteEmptyXLOG(XLogSegNo newXlogSegNo, uint32 walSegSz,
+						   ControlFileData *controlFile, const char *pgdata);
+
+extern void RewriteControlFile(bool noswith, XLogSegNo newXlogSegNo,
+							   uint32 walSegSz, ControlFileData *controlFile,
+							   char *DataDir);
+
+#endif							/* FE_WAL_H */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..2ab5619c02 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,13 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/*
+	 * Did the logical decoding context require processing WALs?
+	 *
+	 * This flag is used only when in 'fast_forward' mode.
+	 */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +152,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool pg_logical_replication_slot_has_pending_wal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 758ca79a81..6559d3f014 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -227,6 +227,10 @@ extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
+extern void create_logical_replication_slot(char *name, char *plugin,
+											bool temporary, bool two_phase,
+											XLogRecPtr restart_lsn,
+											bool find_startpoint);
 
 /* misc stuff */
 extern void ReplicationSlotInitialize(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de90c4958..ce3731224c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#318

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 2 years ago

In reply to: Amit Kapila (#316)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for reviewing! New patch is available at [1]/messages/by-id/TYAPR01MB5866B0614F80CE9F5EF051BDF5D3A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

Some more comments:
1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
First, let's change its name to binary_upgrade_slot_has_pending_wal()
or something like that. Then move the context creation and free
related code into DecodingContextHasDecodedItems(). We can rename
DecodingContextHasDecodedItems() as
pg_logical_replication_slot_has_pending_wal() and place it in
slotfuncs.c. This will make the code structure similar to other slot
functions like pg_replication_slot_advance().

Seems clearer than mine. Fixed.

2. + * Returns true if there are no changes after the confirmed_flush_lsn.

How about something like: "Returns true if there are no decodable WAL
records after the confirmed_flush_lsn."?

Fixed.

3. Shouldn't we need to call CheckSlotPermissions() in
binary_upgrade_validate_wal_logical_end?

Added, but actually it is not needed. This is because only superusers can connect
to the server while upgrading. Please see below codes in InitPostgres().

```
if (IsBinaryUpgrade && !am_superuser)
{
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to connect in binary upgrade mode")));
}
```

4.
+ /*
+ * Also, set processing_required flag if the message is not
+ * transactional. It is needed to notify the message's existence to
+ * the caller side. Usually, the flag is set when either the COMMIT or
+ * ABORT records are decoded, but this must be turned on here because
+ * the non-transactional logical message is decoded without waiting
+ * for these records.
+ */
The first sentence of the comments doesn't seem to be required as that
just says what the code does. So, let's slightly change it to: "We
need to set processing_required flag to notify the message's existence
to the caller side. Usually, the flag is set when either the COMMIT or
ABORT records are decoded, but this must be turned on here because the
non-transactional logical message is decoded without waiting for these
records."

Fixed.

[1]: /messages/by-id/TYAPR01MB5866B0614F80CE9F5EF051BDF5D3A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#319

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#318)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

Here is a new patch.

Previously I wrote:

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

But for now, these changes were reverted because changing pg_resetwal -o stuff
may be a bit risky. This has been located more than ten years so that we should
be more careful for modifying.
Also, I cannot come up with problems if slots are created after the pg_resetwal.
Background processes would not generate decodable changes (listed in [1]/messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com), and
BGworkers by extensions could be ignored [2]/messages/by-id/CAA4eK1L4JB+KH_4EQryDEhyaLBPW6V20LqjdzOxCWyL7rbxqsA@mail.gmail.com.
Based on the discussion on forked thread [3]/messages/by-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg@mail.gmail.com and if it is accepted, we will apply
again.

Also. some comments and function name was improved.

[1]: /messages/by-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/CAA4eK1L4JB+KH_4EQryDEhyaLBPW6V20LqjdzOxCWyL7rbxqsA@mail.gmail.com
[3]: /messages/by-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 1d1f582ba7b624656624bc399e8e12c2915c05d8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v50] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slots() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  76 +++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  53 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 172 ++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  73 +++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 325 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   9 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1061 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..14dee835fb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,77 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +721,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..4144a43afd 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We can also skip decoding when in 'fast_forward' mode. This check must
+	 * be last because we don't want to set that processing_required flag
+	 * unnecessarily.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller side. Usually, the flag is set when either
+		 * the COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We can also skip decoding when in 'fast_forward' mode. In passing set
+	 * the 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..32869a75ab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read to end of WAL starting from the decoding slot's restart_lsn. Return
+ * true if any meaningful/decodable WAL records are encountered, otherwise
+ * false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..2a831bc397 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
 
 #include "postgres.h"
 
+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE									\
@@ -261,3 +267,50 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure the given slot can be upgraded
+ * without data loss.
+ */
+Datum
+binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/*
+	 * Acquire the given slot. There should be no error because the caller has
+	 * already checked the slot exists.
+	 */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/*
+	 * It's caller's responsibility to check the health of the slot.  Upcoming
+	 * functions assume the restart_lsn points to a valid record.
+	 */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..123f47a81f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,155 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are usable and have consumed all the WAL
+ * before shutdown. The check has already been done in
+ * get_old_cluster_logical_slot_infos(), so this function reads the result and
+ * reports to the user.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
+				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of all such logical replication slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..c56769fe54 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot is caught up if any changes are not found while
+	 * decoding. See binary_upgrade_slot_has_pending_wal().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_pending_wal(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase);
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..7acdf31d02 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,16 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +604,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +873,63 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+		char		log_file_name[MAXPGPATH];
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		snprintf(log_file_name, sizeof(log_file_name),
+				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..fc6bf3020a
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,325 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'replica');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Preparations for the subsequent test:
+# 1. Create a slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;
+
+# pg_upgrade will fail because the new cluster wal_level is 'replica'
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/wal_level must be \"logical\", but is set to \"replica\"/],
+	[qr//],
+	'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when max_replication_slots on a new cluster is
+#		too low
+
+# Preparations for the subsequent test:
+# 1. Create a second slot on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#	 tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);
+$old_publisher->stop;
+
+# 3. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# 4. wal_level is set correctly on the new cluster
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Remove the slot 'test_slot2', leaving only 1 slot on the old cluster, so
+#    the new cluster config  max_replication_slots=1 will now be enough.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The failure should be because there are unconsumed
+# WALs after confirmed_flush_lsn of test_slot1.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Consume remained WAL records
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);");
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when there are non-transactional changes
+
+# Preparations for the subsequent test:
+# 1. Emit a non-transactional message
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');");
+$old_publisher->stop;
+
+# pg_upgrade will fail because there is a non-transactional change
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/Your installation contains logical replication slots that cannot be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $log_path = $new_publisher->data_dir . "/pg_upgrade_output.d";
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. The content of the file is same as the previous test
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Remove the remaining slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Dry run, successful check is expected. This is not a live check, so a
+# shutdown checkpoint record would be inserted. We want to test that a
+# subsequent upgrade is successful by skipping such an expected WAL record.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode, '--check'
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 72ea4aa8b8..cdecc37dcf 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_pending_wal', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_pending_wal' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..355247a58b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,13 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/*
+	 * Did the logical decoding context require processing WALs?
+	 *
+	 * This flag is used only when in 'fast_forward' mode.
+	 */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +152,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf..de6c48d914 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#320

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#319)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Here is a new patch.

Previously I wrote:

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

But for now, these changes were reverted because changing pg_resetwal -o stuff
may be a bit risky. This has been located more than ten years so that we should
be more careful for modifying.
Also, I cannot come up with problems if slots are created after the pg_resetwal.
Background processes would not generate decodable changes (listed in [1]), and
BGworkers by extensions could be ignored [2].
Based on the discussion on forked thread [3] and if it is accepted, we will apply
again.

Yeah, I think introducing additional complexity unless it is really
required sounds a bit scary to me as well. BTW, please find attached
some cosmetic changes.

One minor additional comment:
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

Why do we need to set wal_level as logical for subscribers?

--
With Regards,
Amit Kapila.

Attachments:

v50_changes_amit_1.patch.txttext/plain; charset=US-ASCII; name=v50_changes_amit_1.patch.txtDownload

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 4144a43afd..cfa955a679 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -618,9 +618,9 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		return;
 
 	/*
-	 * We can also skip decoding when in 'fast_forward' mode. This check must
-	 * be last because we don't want to set that processing_required flag
-	 * unnecessarily.
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless
+	 * we have a decodable message.
 	 */
 	if (ctx->fast_forward)
 	{
@@ -1307,8 +1307,8 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 		return true;
 
 	/*
-	 * We can also skip decoding when in 'fast_forward' mode. In passing set
-	 * the 'processing_required' flag to indicate, were it not for this mode,
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
 	 * processing *would* have been required.
 	 */
 	if (ctx->fast_forward)
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 32869a75ab..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1953,9 +1953,9 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 }
 
 /*
- * Read to end of WAL starting from the decoding slot's restart_lsn. Return
- * true if any meaningful/decodable WAL records are encountered, otherwise
- * false.
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
  *
  * Although this function is currently used only during pg_upgrade, there are
  * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 2a831bc397..a3a8ade405 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -274,8 +274,8 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
  * Returns true if there are no decodable WAL records after the
  * confirmed_flush_lsn. Otherwise false.
  *
- * This is a special purpose function to ensure the given slot can be upgraded
- * without data loss.
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
  */
 Datum
 binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
@@ -294,16 +294,10 @@ binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
 
 	slot_name = PG_GETARG_NAME(0);
 
-	/*
-	 * Acquire the given slot. There should be no error because the caller has
-	 * already checked the slot exists.
-	 */
+	/* Acquire the given slot. */
 	ReplicationSlotAcquire(NameStr(*slot_name), true);
 
-	/*
-	 * It's caller's responsibility to check the health of the slot.  Upcoming
-	 * functions assume the restart_lsn points to a valid record.
-	 */
+	/* Slots must be valid as otherwise we won't be able to scan the WAL. */
 	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
 
 	end_of_wal = GetFlushRecPtr(NULL);
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 123f47a81f..8f3f5585a4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1541,10 +1541,8 @@ check_new_cluster_logical_replication_slots(void)
 /*
  * check_old_cluster_for_valid_slots()
  *
- * Verify that all the logical slots are usable and have consumed all the WAL
- * before shutdown. The check has already been done in
- * get_old_cluster_logical_slot_infos(), so this function reads the result and
- * reports to the user.
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
  */
 static void
 check_old_cluster_for_valid_slots(bool live_check)
@@ -1607,7 +1605,7 @@ check_old_cluster_for_valid_slots(bool live_check)
 		fclose(script);
 
 		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
 				 "These slots can't be copied, so this cluster cannot be upgraded.\n"
 				 "Consider removing invalid slots and/or consuming the pending WAL if any,\n"
 				 "and then restart the upgrade.\n"
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c56769fe54..5494e69227 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -651,8 +651,8 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
 	/*
 	 * Fetch the logical replication slot information. The check whether the
 	 * slot is considered caught up is done by an upgrade function. This
-	 * regards the slot is caught up if any changes are not found while
-	 * decoding. See binary_upgrade_slot_has_pending_wal().
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_pending_wal().
 	 *
 	 * Note that we can't ensure whether the slot is caught up during
 	 * live_check as the new WAL records could be generated.
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 7acdf31d02..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -190,7 +190,12 @@ main(int argc, char **argv)
 	check_ok();
 
 	/*
-	 * If the old cluster has logical slots, migrate them to a new cluster.
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
 	 */
 	if (count_old_cluster_logical_slots())
 	{
@@ -890,7 +895,6 @@ create_logical_replication_slots(void)
 		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
 		PGconn	   *conn;
 		PQExpBuffer query;
-		char		log_file_name[MAXPGPATH];
 
 		/* Skip this database if there are no slots */
 		if (slot_arr->nslots == 0)
@@ -899,9 +903,6 @@ create_logical_replication_slots(void)
 		conn = connectToServer(&new_cluster, old_db->db_name);
 		query = createPQExpBuffer();
 
-		snprintf(log_file_name, sizeof(log_file_name),
-				 DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
-
 		pg_log(PG_STATUS, "%s", old_db->db_name);
 
 		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 355247a58b..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -110,11 +110,7 @@ typedef struct LogicalDecodingContext
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
 
-	/*
-	 * Did the logical decoding context require processing WALs?
-	 *
-	 * This flag is used only when in 'fast_forward' mode.
-	 */
+	/* Do we need to process any change in 'fast_forward' mode? */
 	bool		processing_required;
 } LogicalDecodingContext;

#321

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Amit Kapila (#320)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, 16 Oct 2023 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Here is a new patch.

Previously I wrote:

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

But for now, these changes were reverted because changing pg_resetwal -o stuff
may be a bit risky. This has been located more than ten years so that we should
be more careful for modifying.
Also, I cannot come up with problems if slots are created after the pg_resetwal.
Background processes would not generate decodable changes (listed in [1]), and
BGworkers by extensions could be ignored [2].
Based on the discussion on forked thread [3] and if it is accepted, we will apply
again.

1) Should this:
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
be:
"Tests for upgrading logical replication slots"

2)  This statement is not entirely true:
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>

If we have some changes like shutdown_checkpoint the upgrade passes,
if we have some changes like create view whose changes will not be
replicated the upgrade fails.

3) All these includes are not required except for "logical.h"
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@

#include "postgres.h"

+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
#include "catalog/binary_upgrade.h"
#include "catalog/heap.h"
#include "catalog/namespace.h"
#include "catalog/pg_type.h"
#include "commands/extension.h"
+#include "funcapi.h"
#include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
#include "utils/array.h"
#include "utils/builtins.h"
+#include "utils/pg_lsn.h"

4) We could print two_phase as true/false instead of 0/1:
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+       /* Quick return if there are no logical slots. */
+       if (slot_arr->nslots == 0)
+               return;
+
+       pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+       for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+       {
+               LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+               pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
two_phase: %d",
+                          slot_info->slotname,
+                          slot_info->plugin,
+                          slot_info->two_phase);
+       }
+}

5) test passes without the below, maybe this is not required:
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#       tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+       "SELECT count(*) FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

6) This message "run of pg_upgrade of old cluster with idle
replication slots" seems wrong:
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+       [
+               'pg_upgrade', '--no-sync',
+               '-d', $old_publisher->data_dir,
+               '-D', $new_publisher->data_dir,
+               '-b', $bindir,
+               '-B', $bindir,
+               '-s', $new_publisher->host,
+               '-p', $old_publisher->port,
+               '-P', $new_publisher->port,
+               $mode,
+       ],
+       1,
+       [
+               qr/Your installation contains invalid logical
replication slots./
+       ],
+       [qr//],
+       'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+       "pg_upgrade_output.d/ not removed after pg_upgrade failure");

7) You could run pgindent and pgperlytidy, it shows there are few
issues present with the patch.

Regards,
Vignesh

#322

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#320)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Thanks for reviewing! PSA new version.

Yeah, I think introducing additional complexity unless it is really
required sounds a bit scary to me as well. BTW, please find attached
some cosmetic changes.

Basically LGTM, but below part was conflicted with Bharath's comment [1]/messages/by-id/CALj2ACXp+LXioY_=9mboEbLD--4c4nnpJCZ+j4fckBdSOQhENA@mail.gmail.com.

```
@@ -1607,7 +1605,7 @@ check_old_cluster_for_valid_slots(bool live_check)
fclose(script);

 		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+		pg_fatal("Your installation contains invalid logical replication slots.\n"
```

How about " Your installation contains logical replication slots that can't be upgraded."?

One minor additional comment:
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

Why do we need to set wal_level as logical for subscribers?

It is not mandatory. The line was copied from tests in src/test/subscription.
Removed the setting from my patch. I felt that it could be removed from other
patches. I will fork new thread and post the patch.

Also, I did some improvements based on the v50, basically for tests.

1. Test file was refactored. pg_uprade was executed many times in the test so the
test time was increasing. Below refactorings were done.

===
a. Checks for both transactional and non-transactional changes were done at the
same time.
b. Removed the dry-run test. It did not improve the coverage.
c. Removed the wal_level test. Other tests like subscriptions and test_decoding
do not contain test for GUCs, so I thought it could be acceptable. Removing
all the GUC test (for max_replication_slots) might be risky, so it was remained.
===

2. Supported the cross-version checks. If an environment variable "oldinstall"
is set, use the binary as old cluster. If the specified one is PG16-, the
test verifies that logical replication slots would not be migrated.
002_pg_upgrade.pl requires that $ENV(olddump) must be also defined, but it
is not needed for our test. I tried to support from PG9.2, which is the oldest
version for Xupgrade test [2]https://github.com/PGBuildFarm/client-code/releases#:~:text=support%20for%20testing%20cross%20version%20upgrade%20extended%20back%20to%209.2. You can see 0002 patch for it.
IIUC pg_create_logical_replication_slot() can be available since PG9.4, so tests
will be skipped if older executables are specified, like:

```
$ oldinstall=/home/hayato/older/pg92/ make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. skipped: Logical replication slots can be available since PG9.4
Files=1, Tests=0, 0 wallclock secs ( 0.03 usr 0.00 sys + 0.08 cusr 0.02 csys = 0.13 CPU)
Result: NOTESTS
```

[1]: /messages/by-id/CALj2ACXp+LXioY_=9mboEbLD--4c4nnpJCZ+j4fckBdSOQhENA@mail.gmail.com
[2]: https://github.com/PGBuildFarm/client-code/releases#:~:text=support%20for%20testing%20cross%20version%20upgrade%20extended%20back%20to%209.2

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 76fce116c37014bbf83fc9875d152fc63db73a9c Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v51 1/2] pg_upgrade: Allow to replicate logical replication
 slots to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  77 +++++-
 src/backend/replication/logical/decode.c      |  48 +++-
 src/backend/replication/logical/logical.c     |  65 +++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  42 ++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 ++-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 +++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 227 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 947 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..37eb573826 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,78 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +722,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..0514d1365e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..697e23f815 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,44 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..8030f9d290 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_relication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problem slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..481f586a2f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..f14a670b78
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,227 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init();
+
+my $bindir = $new_publisher->config_data('--bindir');
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
+
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+);
+$old_publisher->stop();
+
+# 2. max_replication_slots is set to smaller than the number of slots (2)
+#	 present on the old cluster
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+#	 it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+	"SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+#	 this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');"
+);
+
+$old_publisher->stop;
+
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	1,
+	[qr/Your installation contains logical replication slots that can't be upgraded./],
+	[qr//],
+	'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# Actual run, successful upgrade is expected
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old_publisher->data_dir,
+		'-D', $new_publisher->data_dir,
+		'-b', $bindir,
+		'-B', $bindir,
+		'-s', $new_publisher->host,
+		'-p', $old_publisher->port,
+		'-P', $new_publisher->port,
+		$mode,
+	],
+	'run of pg_upgrade of old cluster');
+ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 72ea4aa8b8..f6c4abacba 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf..de6c48d914 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

v51-0002-support-cross-version-upgrade.patchapplication/octet-stream; name=v51-0002-support-cross-version-upgrade.patchDownload

From 9666ca6b02a66e2111a99fff1b42ce3af8c7eeb9 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Mon, 16 Oct 2023 07:34:41 +0000
Subject: [PATCH v51 2/2] support cross-version upgrade

---
 .../003_upgrade_logical_replication_slots.pl  | 510 +++++++++++-------
 1 file changed, 307 insertions(+), 203 deletions(-)

diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
index f14a670b78..c44afe6be8 100644
--- a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -12,216 +12,320 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_for_17_and_later
+{
+	my ($old_publisher, $new_publisher, $mode) = @_;
+
+	my $oldbindir = $old_publisher->config_data('--bindir');
+	my $newbindir = $new_publisher->config_data('--bindir');
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+	#
+	# There are two requirements for GUCs - wal_level and
+	# max_replication_slots, but only max_replication_slots will be tested here
+	# because it reduces the execution time of the test.
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+	);
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
+	);
+	$old_publisher->stop();
+
+	# 2. max_replication_slots is set to smaller than the number of slots (2)
+	#	 present on the old cluster
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Clean up
+	rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+	# Set max_replication_slots to the same value as the number of slots. Both
+	# slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. Because these WAL records do not get
+	#	 consumed it will cause the upcoming pg_upgrade test to fail.
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+	# 2. Advance the slot test_slot2 up to the current WAL location
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+	# 3. Emit a non-transactional message. test_slot2 detects the message so
+	#	 that the upcoming pg_upgrade will also report this slot.
+	$old_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because the slot still has unconsumed WAL records
+	command_checks_all(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# And check the content. Both of slots must be reported that they have
+	# unconsumed WALs after confirmed_flush_lsn.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+	# Clean up
+	rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+
+	$old_publisher->safe_psql('postgres',
+		"SELECT * FROM pg_drop_replication_slot('test_slot1');");
+	$old_publisher->safe_psql('postgres',
+		"SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+	$old_publisher->safe_psql('postgres',
+		"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# Actual run, successful upgrade is expected
+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_for_16_and_prior
+{
+	my ($old_publisher, $new_publisher, $mode) = @_;
+
+	my $oldbindir = $old_publisher->config_data('--bindir');
+	my $newbindir = $new_publisher->config_data('--bindir');
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# Actual run, successful upgrade is expected
+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		'run of pg_upgrade of old cluster');
+
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
 # Can be changed to test the other modes
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
-# Initialize old cluster
-my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
-$old_publisher->init(allows_streaming => 'logical');
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+# Skip tests if the old cluster does not support logical replication slot
+if ($old_publisher->pg_version < 9.4)
+{
+	plan skip_all => 'Logical replication slots can be available since PG9.4';
+}
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# Set extra params if cross-version checks are required. This is needed to
+# avoid using previously initdb'd cluster
+if (defined($ENV{oldinstall}))
+{
+	my @initdb_params = ();
+	push @initdb_params, ('--encoding', 'UTF-8');
+	push @initdb_params, ('--locale', 'C');
+
+	$node_params{extra} = \@initdb_params;
+}
+$old_publisher->init(%node_params);
+
+# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
+# Such clusters regard max_wal_senders as part of max_connections, but the
+# current TAP tester sets these GUCs to the same value.
+if ($old_publisher->pg_version < 12)
+{
+	$old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
+}
 
 # Initialize new cluster
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'logical');
 
-# Initialize subscriber cluster
-my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
-$subscriber->init();
-
-my $bindir = $new_publisher->config_data('--bindir');
-
-# ------------------------------
-# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
-#
-# There are two requirements for GUCs - wal_level and max_replication_slots,
-# but only max_replication_slots will be tested here. This is because to
-# reduce the execution time of the test.
-
-# Preparations for the subsequent test:
-# 1. Create two slots on the old cluster
-$old_publisher->start;
-$old_publisher->safe_psql('postgres',
-	"SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
-);
-$old_publisher->safe_psql('postgres',
-	"SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);"
-);
-$old_publisher->stop();
-
-# 2. max_replication_slots is set to smaller than the number of slots (2)
-#	 present on the old cluster
-$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
-
-# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
-command_checks_all(
-	[
-		'pg_upgrade', '--no-sync',
-		'-d', $old_publisher->data_dir,
-		'-D', $new_publisher->data_dir,
-		'-b', $bindir,
-		'-B', $bindir,
-		'-s', $new_publisher->host,
-		'-p', $old_publisher->port,
-		'-P', $new_publisher->port,
-		$mode,
-	],
-	1,
-	[
-		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
-	],
-	[qr//],
-	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
-);
-ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
-	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
-
-# Clean up
-rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
-# Set max_replication_slots to the same value as the number of slots. Both of
-# slots will be used for subsequent tests.
-$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
-
-
-# ------------------------------
-# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
-
-# Preparations for the subsequent test:
-# 1. Generate extra WAL records. Because these WAL records do not get consumed
-#	 it will cause the upcoming pg_upgrade test to fail.
-$old_publisher->start;
-$old_publisher->safe_psql('postgres',
-	"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
-
-# 2. Advance the slot test_slot2 up to the current WAL location
-$old_publisher->safe_psql('postgres',
-	"SELECT pg_replication_slot_advance('test_slot2', NULL);");
-
-# 3. Emit a non-transactional message. test_slot2 detects the message so that
-#	 this slot will be also reported by upcoming pg_upgrade.
-$old_publisher->safe_psql('postgres',
-	"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');"
-);
-
-$old_publisher->stop;
-
-# pg_upgrade will fail because the slot still has unconsumed WAL records
-command_checks_all(
-	[
-		'pg_upgrade', '--no-sync',
-		'-d', $old_publisher->data_dir,
-		'-D', $new_publisher->data_dir,
-		'-b', $bindir,
-		'-B', $bindir,
-		'-s', $new_publisher->host,
-		'-p', $old_publisher->port,
-		'-P', $new_publisher->port,
-		$mode,
-	],
-	1,
-	[qr/Your installation contains logical replication slots that can't be upgraded./],
-	[qr//],
-	'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
-);
-ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
-	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
-
-# Verify the reason why the logical replication slot cannot be upgraded
-my $slots_filename;
-
-# Find a txt file that contains a list of logical replication slots that cannot
-# be upgraded. We cannot predict the file's path because the output directory
-# contains a milliseconds timestamp. File::Find::find must be used.
-find(
-	sub {
-		if ($File::Find::name =~ m/invalid_logical_relication_slots\.txt/)
-		{
-			$slots_filename = $File::Find::name;
-		}
-	},
-	$new_publisher->data_dir . "/pg_upgrade_output.d");
-
-# And check the content. Both of slots must be reported that they have
-# unconsumed WALs after confirmed_flush_lsn.
-like(
-	slurp_file($slots_filename),
-	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
-	'the previous test failed due to unconsumed WALs');
-like(
-	slurp_file($slots_filename),
-	qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
-	'the previous test failed due to unconsumed WALs');
-
-# Clean up
-rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
-
-
-# ------------------------------
-# TEST: Successful upgrade
-
-# Preparations for the subsequent test:
-# 1. Setup logical replication
-my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
-
-$old_publisher->start;
-
-$old_publisher->safe_psql('postgres',
-	"SELECT * FROM pg_drop_replication_slot('test_slot1');");
-$old_publisher->safe_psql('postgres',
-	"SELECT * FROM pg_drop_replication_slot('test_slot2');");
-
-$old_publisher->safe_psql('postgres',
-	"CREATE PUBLICATION regress_pub FOR ALL TABLES;");
-$subscriber->start;
-$subscriber->safe_psql(
-	'postgres', qq[
-	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
-]);
-$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
-
-# 2. Temporarily disable the subscription
-$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
-$old_publisher->stop;
-
-# Actual run, successful upgrade is expected
-command_ok(
-	[
-		'pg_upgrade', '--no-sync',
-		'-d', $old_publisher->data_dir,
-		'-D', $new_publisher->data_dir,
-		'-b', $bindir,
-		'-B', $bindir,
-		'-s', $new_publisher->host,
-		'-p', $old_publisher->port,
-		'-P', $new_publisher->port,
-		$mode,
-	],
-	'run of pg_upgrade of old cluster');
-ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
-	"pg_upgrade_output.d/ removed after pg_upgrade success");
-
-# Check that the slot 'regress_sub' has migrated to the new cluster
-$new_publisher->start;
-my $result = $new_publisher->safe_psql('postgres',
-	"SELECT slot_name, two_phase FROM pg_replication_slots");
-is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
-
-# Update the connection
-my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
-$subscriber->safe_psql(
-	'postgres', qq[
-	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
-	ALTER SUBSCRIPTION regress_sub ENABLE;
-]);
-
-# Check whether changes on the new publisher get replicated to the subscriber
-$new_publisher->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
-$new_publisher->wait_for_catchup('regress_sub');
-$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(20), 'check changes are replicated to the subscriber');
-
-# Clean up
-$subscriber->stop();
-$new_publisher->stop();
+# Switch workloads depend on the major version of the old cluster.  Upgrading
+# logical replication slots has been supported since PG17.
+if ($old_publisher->pg_version <= 16)
+{
+	test_for_16_and_prior($old_publisher, $new_publisher, $mode);
+}
+else
+{
+	test_for_17_and_later($old_publisher, $new_publisher, $mode);
+}
 
 done_testing();
-- 
2.27.0

#323

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: vignesh C (#321)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thank you for reviewing! New version can be available in [1]/messages/by-id/TYAPR01MB5866AC8A7694113BCBE0A71EF5D6A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

1) Should this:
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
be:
"Tests for upgrading logical replication slots"

Fixed.

2)  This statement is not entirely true:
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>
If we have some changes like shutdown_checkpoint the upgrade passes,
if we have some changes like create view whose changes will not be
replicated the upgrade fails.

Hmm, I felt current description seems sufficient, but how about the below?
"The old cluster has replicated all the transactions and logical decoding
messages to subscribers."

3) All these includes are not required except for "logical.h"
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@
#include "postgres.h"

+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
#include "catalog/binary_upgrade.h"
#include "catalog/heap.h"
#include "catalog/namespace.h"
#include "catalog/pg_type.h"
#include "commands/extension.h"
+#include "funcapi.h"
#include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
#include "utils/array.h"
#include "utils/builtins.h"
+#include "utils/pg_lsn.h"

I preferred to include all the needed items in each C files, but removed.

4) We could print two_phase as true/false instead of 0/1:
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+       /* Quick return if there are no logical slots. */
+       if (slot_arr->nslots == 0)
+               return;
+
+       pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+       for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+       {
+               LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+               pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
two_phase: %d",
+                          slot_info->slotname,
+                          slot_info->plugin,
+                          slot_info->two_phase);
+       }
+}

Fixed.

5) test passes without the below, maybe this is not required:
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#       tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+       "SELECT count(*) FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

This part is removed because of the refactoring.

6) This message "run of pg_upgrade of old cluster with idle
replication slots" seems wrong:
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+       [
+               'pg_upgrade', '--no-sync',
+               '-d', $old_publisher->data_dir,
+               '-D', $new_publisher->data_dir,
+               '-b', $bindir,
+               '-B', $bindir,
+               '-s', $new_publisher->host,
+               '-p', $old_publisher->port,
+               '-P', $new_publisher->port,
+               $mode,
+       ],
+       1,
+       [
+               qr/Your installation contains invalid logical
replication slots./
+       ],
+       [qr//],
+       'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+       "pg_upgrade_output.d/ not removed after pg_upgrade failure");

Rephased.

7) You could run pgindent and pgperlytidy, it shows there are few
issues present with the patch.

I ran both.

[1]: /messages/by-id/TYAPR01MB5866AC8A7694113BCBE0A71EF5D6A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#324

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#322)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c

0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");

0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.

SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

~~~

2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);

Can't you combine those SQL in the same $old_publisher->safe_psql.

~~~

3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");

The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?

~~~

4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);

I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.

4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.

SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

~~~

5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(

/because the slot still has/because there are slots still having/

~~~

6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);

/slot/slots/

~~~

7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.

SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.

~~~

8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");

8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

~~~

9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');

Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#325

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Peter Smith (#324)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Oct 18, 2023 at 7:31 AM Peter Smith <smithpb2250@gmail.com> wrote:

======
src/bin/pg_upgrade/check.c
0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");
0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

~

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

I prefer the current naming. I think 'invalid' here indicates both
types of slots that are invalidated by the checkpointer and those that
have pending WAL to be consumed.

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

I think we don't need the second part of the comment: "Two GUCs ...".
Ideally, we should test each parameter's invalid value but that could
be costly, so I think it is okay to test a few of them.

--
With Regards,
Amit Kapila.

#326

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#322)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some comments for the patch v51-0002

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
+# Such clusters regard max_wal_senders as part of max_connections, but the
+# current TAP tester sets these GUCs to the same value.
+if ($old_publisher->pg_version < 12)
+{
+ $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
+}

1a.
I was initially unsure what the above comment meant -- thanks for the
offline explanation.

SUGGESTION
The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' to the same value (10) but PG12 and prior considered
max_walsenders as a subset of max_connections, so setting the same
value will fail.

1b.
I also felt it is better to explicitly set both values in the < PG12
configuration because otherwise, you are still assuming knowledge that
the TAP default max_connections is 10.

SUGGESTION
$old_publisher->append_conf('postgresql.conf', qq{
max_wal_senders = 5
max_connections = 10
});

~~~

2.
+# Switch workloads depend on the major version of the old cluster.  Upgrading
+# logical replication slots has been supported since PG17.
+if ($old_publisher->pg_version <= 16)
+{
+ test_for_16_and_prior($old_publisher, $new_publisher, $mode);
+}
+else
+{
+ test_for_17_and_later($old_publisher, $new_publisher, $mode);
+}

IMO it is less confusing to have fewer version numbers floating around
in comments and names and code. So instead of referring to 16 and 17,
how about just referring to 17 everywhere?

For example

SUGGESTION
# Test according to the major version of the old cluster.
# Upgrading logical replication slots has been supported only since PG17.

if ($old_publisher->pg_version >= 17)
{
test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
}
else
{
test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
}

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#327

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Peter Smith (#324)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.

Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c
0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");
0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

Fixed.

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

Per discussion [1]/messages/by-id/CAA4eK1+AHSWPs2_jn=ftJKRqz-NXU6o=rPQ3f=H-gcPsgpPFrw@mail.gmail.com, I kept current style.

src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

First part was fixed. Second part was removed per [1]/messages/by-id/CAA4eK1+AHSWPs2_jn=ftJKRqz-NXU6o=rPQ3f=H-gcPsgpPFrw@mail.gmail.com.

2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);

Can't you combine those SQL in the same $old_publisher->safe_psql.

Combined.

3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?

Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.

4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);

I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.

4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.

SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.

Fixed.

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.

5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
/because the slot still has/because there are slots still having/

Fixed.

6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);

/slot/slots/

Fixed.

7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.
SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.

Fixed.

8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");

8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/

Fixed.

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

Combined.

9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');

Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.

Fixed.

[1]: /messages/by-id/CAA4eK1+AHSWPs2_jn=ftJKRqz-NXU6o=rPQ3f=H-gcPsgpPFrw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v52-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v52-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 9fa9102039a71f1a05bacdd8604d9d6c5a4264a1 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v52] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy
---
 doc/src/sgml/ref/pgupgrade.sgml               |  77 +++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  12 +
 src/backend/utils/adt/pg_upgrade_support.c    |  42 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 +++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 331 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1051 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..37eb573826 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,78 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +722,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..0514d1365e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..9980e2fd79 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			Assert(max_slot_wal_keep_size_mb == -1);
+			elog(ERROR, "replication slots must not be invalidated during the upgrade");
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..697e23f815 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,44 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..fdf1f0a0c4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problem slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..481f586a2f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..ca79bfa715
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,331 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+use File::Path qw(rmtree);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, $mode) = @_;
+
+	my $oldbindir = $old_publisher->config_data('--bindir');
+	my $newbindir = $new_publisher->config_data('--bindir');
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+		]
+	);
+	$old_publisher->stop();
+
+	# 2. max_replication_slots is set to smaller than the number of slots (2)
+	#	 present on the old cluster
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Clean up
+	rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+	# Set max_replication_slots to the same value as the number of slots. Both
+	# of slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+	# 3. Emit a non-transactional message. test_slot2 detects the message so
+	#	 that this slot will be also reported by upcoming pg_upgrade.
+	$old_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');"
+	);
+
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+	# Clean up
+	rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+		]
+	);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, $mode) = @_;
+
+	my $oldbindir = $old_publisher->config_data('--bindir');
+	my $newbindir = $new_publisher->config_data('--bindir');
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# Actual run, successful upgrade is expected
+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
+		'run of pg_upgrade of old cluster');
+
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# Set extra params if cross-version checks are required. This is needed to
+# avoid using previously initdb'd cluster
+if (defined($ENV{oldinstall}))
+{
+	my @initdb_params = ();
+	push @initdb_params, ('--encoding', 'UTF-8');
+	push @initdb_params, ('--locale', 'C');
+
+	$node_params{extra} = \@initdb_params;
+}
+$old_publisher->init(%node_params);
+
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
+# the same value (10) but PG12 and prior considered max_walsenders as a subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+	$old_publisher->append_conf(
+		'postgresql.conf', qq[
+	max_wal_senders = 5
+	max_connections = 10
+	]);
+}
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..0699596888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf..de6c48d914 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#328

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Peter Smith (#326)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thank you for reviewing! New patch is available in [1]/messages/by-id/TYCPR01MB5870EBEBC89F5224F6B3788CF5D5A@TYCPR01MB5870.jpnprd01.prod.outlook.com.

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
+# Such clusters regard max_wal_senders as part of max_connections, but the
+# current TAP tester sets these GUCs to the same value.
+if ($old_publisher->pg_version < 12)
+{
+ $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
+}
1a.
I was initially unsure what the above comment meant -- thanks for the
offline explanation.

SUGGESTION
The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' to the same value (10) but PG12 and prior considered
max_walsenders as a subset of max_connections, so setting the same
value will fail.

Fixed.

1b.
I also felt it is better to explicitly set both values in the < PG12
configuration because otherwise, you are still assuming knowledge that
the TAP default max_connections is 10.

SUGGESTION
$old_publisher->append_conf('postgresql.conf', qq{
max_wal_senders = 5
max_connections = 10
});

Fixed.

2.
+# Switch workloads depend on the major version of the old cluster.  Upgrading
+# logical replication slots has been supported since PG17.
+if ($old_publisher->pg_version <= 16)
+{
+ test_for_16_and_prior($old_publisher, $new_publisher, $mode);
+}
+else
+{
+ test_for_17_and_later($old_publisher, $new_publisher, $mode);
+}
IMO it is less confusing to have fewer version numbers floating around
in comments and names and code. So instead of referring to 16 and 17,
how about just referring to 17 everywhere?

For example

SUGGESTION
# Test according to the major version of the old cluster.
# Upgrading logical replication slots has been supported only since PG17.

if ($old_publisher->pg_version >= 17)
{
test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
}
else
{
test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
}

In HEAD code, the pg_version seems "17devel". The string seemed smaller than 17 for Perl.
(i.e., "17devel" >= 17 means false)
For the purpose of comparing only the major version, pg_version->major was used.

Also, I removed the support for ~PG9.4. I cannot find descriptions, but according to [2]/messages/by-id/YsUrUDrRhUbuU/6k@paquier.xyz,
Cluster.pm does not support such binaries.
(cluster_name is set when the server process is started, but the GUC has been added in PG9.5)

[1]: /messages/by-id/TYCPR01MB5870EBEBC89F5224F6B3788CF5D5A@TYCPR01MB5870.jpnprd01.prod.outlook.com
[2]: /messages/by-id/YsUrUDrRhUbuU/6k@paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#329

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#327)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v52-0001

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+ # 2. max_replication_slots is set to smaller than the number of slots (2)
+ # present on the old cluster

SUGGESTION
2. Set 'max_replication_slots' to be less than the number of slots (2)
present on the old cluster.

~~~

2.
+ # Set max_replication_slots to the same value as the number of slots. Both
+ # of slots will be used for subsequent tests.

SUGGESTION
Set 'max_replication_slots' to match the number of slots (2) present
on the old cluster.
Both slots will be used for subsequent tests.

~~~

3.
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );

SUGGESTION
3. Emit a non-transactional message. This will cause test_slot2 to
detect the unconsumed WAL record.

~~~

4.
+ # Preparations for the subsequent test:
+ # 1. Generate extra WAL records. At this point neither test_slot1 nor
+ # test_slot2 has consumed them.
+ $old_publisher->start;
+ $old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );
+
+ $old_publisher->stop;

All of the above are sequentially executed on the
old_publisher->safe_psql, so consider if it is worth combining them
all in a single call (keeping the comments 1,2,3 separate still)

For example,

$old_publisher->start;
$old_publisher->safe_psql('postgres', qq[
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
SELECT pg_replication_slot_advance('test_slot2', NULL);
SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
]);
$old_publisher->stop;

~~~

5.
+ # Clean up
+ $subscriber->stop();
+ $new_publisher->stop();

Should this also drop the 'test_slot1' and 'test_slot2'?

~~~

6.
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+ my ($old_publisher, $new_publisher, $mode) = @_;
+
+ my $oldbindir = $old_publisher->config_data('--bindir');
+ my $newbindir = $new_publisher->config_data('--bindir');

SUGGESTION (let's not mention lots of different numbers; just refer to 17)
This function will be executed when the old cluster version is prior to PG17.

7.
+ # Actual run, successful upgrade is expected
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $oldbindir,
+ '-B', $newbindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
+
+ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");

7a.
The comment is wrong?

SUGGESTION
# pg_upgrade should NOT be successful

7b.
There is a blank line here before the ok() function, but in the other
tests, there was none. Better to be consistent.

~~~

8.
+ # Clean up
+ $new_publisher->stop();

Should this also drop the 'test_slot'?

~~~

9.
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
+# the same value (10) but PG12 and prior considered max_walsenders as a subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

If the comment is correct, then PG12 *and* prior, should be testing
"<= 12", not "< 12". right?

~~~

10.
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)

This comment seems wrong IMO. I think we always running the latest
version of pg_upgrade so slot migration is always "supported" from now
on. IIUC you intended this comment to be saying something about the
old_publisher slots.

BEFORE
Upgrading logical replication slots has been supported only since PG17.

SUGGESTION
Upgrading logical replication slots from versions older than PG17 is
not supported.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#330

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#327)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.
Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c
0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");
0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
Fixed.

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

Per discussion [1], I kept current style.
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.
First part was fixed. Second part was removed per [1].
2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);
Can't you combine those SQL in the same $old_publisher->safe_psql.
Combined.
3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?
Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.
4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);
I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.
4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.
Fixed.

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
/because the slot still has/because there are slots still having/
Fixed.
6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);
/slot/slots/
Fixed.
7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.
SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.
Fixed.
8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/
Fixed.

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

Combined.
9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.
Fixed.

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options  "-c
max_slot_wal_keep_size=val"':
+       /*
+        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+        * checkpointer process.  If WALs required by logical replication slots
+        * are removed, the slots are unusable.  This setting prevents the
+        * invalidation of slots during the upgrade. We set this option when
+        * cluster is PG17 or later because logical replication slots
can only be
+        * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+        */
+       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+               appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

2) Because we are able to override max_slot_wal_keep_size there is a
chance of slot getting invalidated and Assert being hit:
+               /*
+                * The logical replication slots shouldn't be invalidated as
+                * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+                *
+                * The following is just a sanity check.
+                */
+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       Assert(max_slot_wal_keep_size_mb == -1);
+                       elog(ERROR, "replication slots must not be
invalidated during the upgrade");
+               }

3) File 003_logical_replication_slots.pl is now changed to
003_upgrade_logical_replication_slots.pl, it should be change here too
accordingly:
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32

+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+

Regards,
Vignesh

#331

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#327)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Wednesday, October 18, 2023 5:26 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.

Thanks for updating the patch, here are few comments for the test.

# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
# the same value (10) but PG12 and prior considered max_walsenders as a subset
# of max_connections, so setting the same value will fail.
if ($old_publisher->pg_version->major < 12)
{
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);

I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);

I think we don't need to set the last two parameters here as we don't check
these info in the tests.

# Set extra params if cross-version checks are required. This is needed to
# avoid using previously initdb'd cluster
if (defined($ENV{oldinstall}))
{
my @initdb_params = ();
push @initdb_params, ('--encoding', 'UTF-8');
push @initdb_params, ('--locale', 'C');

I am not sure I understand the comment, would it be possible provide a bit more
explanation about the purpose of this setting ? And I see 002_pg_upgrade always
have these setting even if oldinstall is not defined, so shall we follow the
same ?

+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],

I think all the pg_upgrade commands in the test are the same, so we can save the cmd
in a variable and pass them to command_xx(). I think it can save some effort to
check the difference of each command and can also reduce some codes.

Best Regards,
Hou zj

#332

Shlok Kyal

shlok.kyal.oss@gmail.com

about 2 years ago

In reply to: vignesh C (#330)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options  "-c
max_slot_wal_keep_size=val"':
+       /*
+        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+        * checkpointer process.  If WALs required by logical replication slots
+        * are removed, the slots are unusable.  This setting prevents the
+        * invalidation of slots during the upgrade. We set this option when
+        * cluster is PG17 or later because logical replication slots
can only be
+        * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+        */
+       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+               appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

I have tested the above scenario. We are able to override the
max_slot_wal_keep_size by using '--new-options "-c
max_slot_wal_keep_size=val"'. And also with some insert statements
during pg_upgrade, old WAL file were deleted and logical replication
slots were invalidated. Since the slots were invalidated replication
was not happening after the upgrade.

Thanks,
Shlok Kumar Kyal

#333

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#327)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.
Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c
0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");
0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
Fixed.

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

Per discussion [1], I kept current style.
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.
First part was fixed. Second part was removed per [1].
2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);
Can't you combine those SQL in the same $old_publisher->safe_psql.
Combined.
3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?
Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.
4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);
I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.
4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.
Fixed.

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
/because the slot still has/because there are slots still having/
Fixed.
6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);
/slot/slots/
Fixed.
7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.
SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.
Fixed.
8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/
Fixed.

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

Combined.
9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.
Fixed.

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the invalid_logical_replication_slots.txt
file. There is something wrong here.
+       # 2. Advance the slot test_slot2 up to the current WAL location, but
+       #        test_slot1 still has unconsumed WAL records.
+       $old_publisher->safe_psql('postgres',
+               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+       # 3. Emit a non-transactional message. test_slot2 detects the message so
+       #        that this slot will be also reported by upcoming pg_upgrade.
+       $old_publisher->safe_psql('postgres',
+               "SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+       );

2) If the test fails here, it is difficult to debug as the
pg_upgrade_output.d directory was removed, so better to keep the
directory as it is this case:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+
+       # Clean up
+       rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

3) The below could be changed:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');

to:
my $result = slurp_file($slots_filename);
is( $result, qq(The slot "test_slot1" has not consumed the WAL yet
The slot "test_slot2" has not consumed the WAL yet
),
'the previous test failed due to unconsumed WALs');

Regards,
Vignesh

#334

Shlok Kyal

shlok.kyal.oss@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#327)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

I tested a test scenario:
I started a new publisher with 'max_replication_slots' parameter set
to '1' and created a streaming replication with the new publisher as
primary node.
Then I did a pg_upgrade from old publisher to new publisher. The
upgrade failed with following error:

Restoring logical replication slots in the new cluster
SQL command failed
SELECT * FROM pg_catalog.pg_create_logical_replication_slot('test1',
'pgoutput', false, false);
ERROR: all replication slots are in use
HINT: Free one or increase max_replication_slots.

Failure, exiting

Should we document that the existing replication slots are taken in
consideration while setting 'max_replication_slots' value in the new
publisher?

Thanks
Shlok Kumar Kyal

On Wed, 18 Oct 2023 at 15:01, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Show quoted text

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.
Here are some review comments for v51-0001

======
src/bin/pg_upgrade/check.c
0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE    *script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");
0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
Fixed.

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

Per discussion [1], I kept current style.
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.
SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.
First part was fixed. Second part was removed per [1].
2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);
Can't you combine those SQL in the same $old_publisher->safe_psql.
Combined.
3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?
Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.
4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);
I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.
4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get
consumed
+# it will cause the upcoming pg_upgrade test to fail.
SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
# has consumed them.
Fixed.

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
# still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
/because the slot still has/because there are slots still having/
Fixed.
6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);
/slot/slots/
Fixed.
7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.
SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.
Fixed.
8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/
Fixed.

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

Combined.
9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
Now that the "Dry run" part is removed, it seems unnecessary to say
"Actual run" for this part.

SUGGESTION
# pg_upgrade should be successful.
Fixed.

[1]: /messages/by-id/CAA4eK1+AHSWPs2_jn=ftJKRqz-NXU6o=rPQ3f=H-gcPsgpPFrw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#335

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Shlok Kyal (#334)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Shlok,

Thanks for testing the feature!

I tested a test scenario:
I started a new publisher with 'max_replication_slots' parameter set
to '1' and created a streaming replication with the new publisher as
primary node.

Just to confirm what you did - you set up a physical replication and the
target of pg_upgrade was set to the primary, right?

I think we can assume that new cluster (target of pg_upgrade) is not used yet.
The documentation describes the usage [1]https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=Initialize%20the%20new%20PostgreSQL%20cluster and it says that we must initialize
the cluster (at step 4) and then run the pg_upgrade (at step 10).

Therefore I don't think we should document anything about it.

[1]: https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=Initialize%20the%20new%20PostgreSQL%20cluster

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#336

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Peter Smith (#329)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Peter,

Thanks for reviewing! PSA new version.

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
1.
+ # 2. max_replication_slots is set to smaller than the number of slots (2)
+ # present on the old cluster
SUGGESTION
2. Set 'max_replication_slots' to be less than the number of slots (2)
present on the old cluster.

Fixed.

2.
+ # Set max_replication_slots to the same value as the number of slots. Both
+ # of slots will be used for subsequent tests.
SUGGESTION
Set 'max_replication_slots' to match the number of slots (2) present
on the old cluster.
Both slots will be used for subsequent tests.

Fixed.

3.
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );

SUGGESTION
3. Emit a non-transactional message. This will cause test_slot2 to
detect the unconsumed WAL record.

Fixed.

4.
+ # Preparations for the subsequent test:
+ # 1. Generate extra WAL records. At this point neither test_slot1 nor
+ # test_slot2 has consumed them.
+ $old_publisher->start;
+ $old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );
+
+ $old_publisher->stop;
All of the above are sequentially executed on the
old_publisher->safe_psql, so consider if it is worth combining them
all in a single call (keeping the comments 1,2,3 separate still)

For example,

$old_publisher->start;
$old_publisher->safe_psql('postgres', qq[
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
SELECT pg_replication_slot_advance('test_slot2', NULL);
SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
]);
$old_publisher->stop;

Fixed.

5.
+ # Clean up
+ $subscriber->stop();
+ $new_publisher->stop();
Should this also drop the 'test_slot1' and 'test_slot2'?

'test_slot1' and 'test_slot2' have already been removed while preparing in
"Successful upgrade" case. Also, I don't think objects have to be removed at the
end. It is tested by other parts, and it may make the test more difficult to
debug, if there are some failures.

6.
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+ my ($old_publisher, $new_publisher, $mode) = @_;
+
+ my $oldbindir = $old_publisher->config_data('--bindir');
+ my $newbindir = $new_publisher->config_data('--bindir');
SUGGESTION (let's not mention lots of different numbers; just refer to 17)
This function will be executed when the old cluster version is prior to PG17.

Fixed.

7.
+ # Actual run, successful upgrade is expected
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $oldbindir,
+ '-B', $newbindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
+
+ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");

7a.
The comment is wrong?

SUGGESTION
# pg_upgrade should NOT be successful

No, pg_uprade will success but no logical replication slots are migrated.
Comments docs were added.

7b.
There is a blank line here before the ok() function, but in the other
tests, there was none. Better to be consistent.

Removed.

8.
+ # Clean up
+ $new_publisher->stop();
Should this also drop the 'test_slot'?

I don't think so. Please see above.

9.
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
to
+# the same value (10) but PG12 and prior considered max_walsenders as a
subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

If the comment is correct, then PG12 *and* prior, should be testing
"<= 12", not "< 12". right?

I analyzed more and I was wrong - we must set GUCs here only for PG9.6-.
Regarding PG11 and PG10, the corresponding constructor will be chosen in new() [a],
and these instance will set max_wal_senders to 5 [b].
As for PG9.6-, the related package has not been defined yet so that such a
workaround will not be used. So we must set manually.

Actually, the part will be not needed when Cluster.pm supports PG9.6-. If needed
we can start another thread and support them. For now the case is handled ad-hoc.

10.
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)
This comment seems wrong IMO. I think we always running the latest
version of pg_upgrade so slot migration is always "supported" from now
on. IIUC you intended this comment to be saying something about the
old_publisher slots.

BEFORE
Upgrading logical replication slots has been supported only since PG17.

SUGGESTION
Upgrading logical replication slots from versions older than PG17 is
not supported.

Fixed.

[a]:
```
# Use a subclass as defined below (or elsewhere) if this version
# isn't fully compatible. Warn if the version is too old and thus we don't
# have a subclass of this class.
if (ref $ver && $ver < $min_compat)
{
my $maj = $ver->major(separator => '_');
my $subclass = $class . "::V_$maj";
if ($subclass->isa($class))
{
bless $node, $subclass;
}
```

[b]:
```
sub init
{
my ($self, %params) = @_;
$self->SUPER::init(%params);
$self->adjust_conf('postgresql.conf', 'max_wal_senders',
$params{allows_streaming} ? 5 : 0);
}
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v53-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v53-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 8ad7f6178fc34a59a5d4e39c744f371159c83bc0 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v53] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  42 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 +++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 286 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1009 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..0296c3f89d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..0514d1365e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..1426a0bbb6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must not be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..697e23f815 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,44 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..fdf1f0a0c4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problem slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..481f586a2f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..fa70486619
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,286 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+		]
+	);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', NULL);
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+		]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	my $slots_file_content = slurp_file($slots_filename);
+	is( $slots_file_content,
+		qq(The slot "test_slot1" has not consumed the WAL yet
+The slot "test_slot2" has not consumed the WAL yet
+),
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+		]
+	);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# Set up some settings for the old cluster, so that we can ensures that initdb
+# will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale', 'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns 'max_wal_senders' and
+# 'max_connections' to the same value (10). But these versions considered
+# max_wal_senders as a subset of max_connections, so setting the same value
+# will fail. This adjustment will not be needed when packages for older
+#versions are defined.
+if ($old_publisher->pg_version->major <= 9.6)
+{
+	print "set max_wal_senders\n";
+
+	$old_publisher->append_conf(
+		'postgresql.conf', qq[
+	max_wal_senders = 5
+	max_connections = 10
+	]);
+}
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d', $old_publisher->data_dir,
+	'-D', $new_publisher->data_dir,
+	'-b', $oldbindir,
+	'-B', $newbindir,
+	'-s', $new_publisher->host,
+	'-p', $old_publisher->port,
+	'-P', $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..0699596888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf..de6c48d914 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#337

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: vignesh C (#330)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thanks for reviewing! New patch can be available in [1]/messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com.

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options  "-c
max_slot_wal_keep_size=val"':
+       /*
+        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+        * checkpointer process.  If WALs required by logical replication slots
+        * are removed, the slots are unusable.  This setting prevents the
+        * invalidation of slots during the upgrade. We set this option when
+        * cluster is PG17 or later because logical replication slots
can only be
+        * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+        */
+       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+               appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

Hmm, I don't think we have to add checks. Other settings, like synchronous_commit
and fsync, can be also overwritten, but pg_upgrade has never checked. Therefore,
it's user's responsibility to not set max_slot_wal_keep_size to a dangerous
value.

2) Because we are able to override max_slot_wal_keep_size there is a
chance of slot getting invalidated and Assert being hit:
+               /*
+                * The logical replication slots shouldn't be invalidated as
+                * max_slot_wal_keep_size GUC is set to -1 during the
upgrade.
+                *
+                * The following is just a sanity check.
+                */
+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       Assert(max_slot_wal_keep_size_mb == -1);
+                       elog(ERROR, "replication slots must not be
invalidated during the upgrade");
+               }

Hmm, so how about removing an assert and changing the error message more
appropriate? I still think it seldom occurs.

3) File 003_logical_replication_slots.pl is now changed to
003_upgrade_logical_replication_slots.pl, it should be change here too
accordingly:
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
PGAPPICON = win32

+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+

Fixed.

[1]: /messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#338

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#331)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

Thanks for reviewing! New patch can be available in [1]/messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com.

Thanks for updating the patch, here are few comments for the test.

1.

# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
to
# the same value (10) but PG12 and prior considered max_walsenders as a subset
# of max_connections, so setting the same value will fail.
if ($old_publisher->pg_version->major < 12)
{
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);

I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

I thought you mentioned about Cluster::V_11::init(). I analyzed based on that and
found a fault. Could you please check [1]/messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com?

2.

SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);
SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);

I think we don't need to set the last two parameters here as we don't check
these info in the tests.

Removed.

3.

# Set extra params if cross-version checks are required. This is needed to
# avoid using previously initdb'd cluster
if (defined($ENV{oldinstall}))
{
my @initdb_params = ();
push @initdb_params, ('--encoding', 'UTF-8');
push @initdb_params, ('--locale', 'C');

I am not sure I understand the comment, would it be possible provide a bit more
explanation about the purpose of this setting ? And I see 002_pg_upgrade always
have these setting even if oldinstall is not defined, so shall we follow the
same ?

Fixed.
Actually settings are not needed for new cluster, but seems better to follow 002.

4.
+	command_ok(
+		[
+			'pg_upgrade', '--no-sync',
+			'-d', $old_publisher->data_dir,
+			'-D', $new_publisher->data_dir,
+			'-b', $oldbindir,
+			'-B', $newbindir,
+			'-s', $new_publisher->host,
+			'-p', $old_publisher->port,
+			'-P', $new_publisher->port,
+			$mode,
+		],
I think all the pg_upgrade commands in the test are the same, so we can save the
cmd
in a variable and pass them to command_xx(). I think it can save some effort to
check the difference of each command and can also reduce some codes.

Fixed.

[1]: /messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#339

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Shlok Kyal (#332)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Shlok,

I have tested the above scenario. We are able to override the
max_slot_wal_keep_size by using '--new-options "-c
max_slot_wal_keep_size=val"'. And also with some insert statements
during pg_upgrade, old WAL file were deleted and logical replication
slots were invalidated. Since the slots were invalidated replication
was not happening after the upgrade.

Yeah, theoretically it could be overwritten, but I still think we do not have to
guard. Also, connections must not be established during the upgrade [1]/messages/by-id/ZNZ4AxUMIrnMgRbo@momjian.us.
I improved the ereport() message in the new patch[2]/messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com. How do you think?

[1]: /messages/by-id/ZNZ4AxUMIrnMgRbo@momjian.us
[2]: /messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#340

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: vignesh C (#333)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Vignesh,

Thanks for revieing! New patch can be available in [1]/messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com.

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the invalid_logical_replication_slots.txt
file. There is something wrong here.
+       # 2. Advance the slot test_slot2 up to the current WAL location, but
+       #        test_slot1 still has unconsumed WAL records.
+       $old_publisher->safe_psql('postgres',
+               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+       # 3. Emit a non-transactional message. test_slot2 detects the message
so
+       #        that this slot will be also reported by upcoming pg_upgrade.
+       $old_publisher->safe_psql('postgres',
+               "SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+       );

The comment was updated based on others. How do you think?

2) If the test fails here, it is difficult to debug as the
pg_upgrade_output.d directory was removed, so better to keep the
directory as it is this case:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+
+       # Clean up
+       rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

Right. Current style just follows the 002 test. I removed rmtree().

3) The below could be changed:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');

Replaced, but the formatting seems not good. I wanted to hear opinions from others.

[1]: /messages/by-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#341

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#336)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

Thanks for reviewing! PSA new version.

Hmm. The cfbot got angry, whereas it can pass on my machine.
It seems that the ordering in invalid_logical_replication_slots.txt is not fixed.

A change for checking the content was reverted. It could pass on my CI.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v54-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v54-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 4c70a863574e43fa7fc720b0fbadd2b9da8ce14c Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v54] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  42 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 +++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 286 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1009 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..0296c3f89d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 730061c9da..0514d1365e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -599,12 +599,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -621,6 +617,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1285,7 +1301,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..1426a0bbb6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must not be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..697e23f815 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,44 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..fdf1f0a0c4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problem slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..481f586a2f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..c6be7d5d9e
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,286 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+		]
+	);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', NULL);
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+		]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+		]
+	);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# Set up some settings for the old cluster, so that we can ensures that initdb
+# will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale', 'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns 'max_wal_senders' and
+# 'max_connections' to the same value (10). But these versions considered
+# max_wal_senders as a subset of max_connections, so setting the same value
+# will fail. This adjustment will not be needed when packages for older
+#versions are defined.
+if ($old_publisher->pg_version->major <= 9.6)
+{
+	$old_publisher->append_conf(
+		'postgresql.conf', qq[
+	max_wal_senders = 5
+	max_connections = 10
+	]);
+}
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d', $old_publisher->data_dir,
+	'-D', $new_publisher->data_dir,
+	'-b', $oldbindir,
+	'-B', $newbindir,
+	'-s', $new_publisher->host,
+	'-p', $old_publisher->port,
+	'-P', $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..0699596888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf..de6c48d914 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#342

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#341)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

Here are some review comments for v54-0001

======
src/backend/replication/slot.c

1.
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("replication slots must not be invalidated during the upgrade"),
+ errhint("\"max_slot_wal_keep_size\" must not be set to -1 during the
upgrade"));
+ }

This new error is replacing the old code:
+ Assert(max_slot_wal_keep_size_mb == -1);

Is that errhint correct? Shouldn't it say "must" instead of "must not"?

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

2. General formating

Some of the "]);" formatting and indenting for the multiple SQL
commands is inconsistent.

For example,

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+ SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+ ]
+ );

versus

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+ SELECT pg_replication_slot_advance('test_slot2', NULL);
+ SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
+ ]);

~~~

3.
+# Set up some settings for the old cluster, so that we can ensures that initdb
+# will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale', 'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);

Why would initdb not be done if these were not set? I didn't
understand the comment.

/so that we can ensures/to ensure/

~~~

4.
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns 'max_wal_senders' and
+# 'max_connections' to the same value (10). But these versions considered
+# max_wal_senders as a subset of max_connections, so setting the same value
+# will fail. This adjustment will not be needed when packages for older
+#versions are defined.
+if ($old_publisher->pg_version->major <= 9.6)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

4a.
IMO remove the complicated comment trying to explain the problem and
just to unconditionally set the values you want.

SUGGESTION#1
# Older PG version had different rules for the inter-dependency of
'max_wal_senders' and 'max_connections',
# so assign values which will work for all PG versions.
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);

4b.
If you really want to put special code here then I think the comment
needs to be more descriptive like below. IMO this suggestion is
overkill, #4a above is much simpler.

SUGGESTION#2
# Versions prior to PG12 considered max_walsenders as a subset
max_connections, so setting the same value will fail.
#
# The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' as follows:
# PG_11: 'max_wal_senders=5' and 'max_connections=10'
# PG_10: 'max_wal_senders=5' and 'max_connections=10'
# Everything else: 'max_wal_senders=10' and 'max_connections=10'
#
# The following code is needed to make adjustments for versions not
already being handled by Cluster.pm.

4c.
Alternatively, make necessary adjustments in the Cluster.pm to set
appropriate defaults for all older versions. Then probably you can
remove all this code entirely.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#343

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#337)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, 19 Oct 2023 at 16:14, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Vignesh,

Thanks for reviewing! New patch can be available in [1].

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options  "-c
max_slot_wal_keep_size=val"':
+       /*
+        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+        * checkpointer process.  If WALs required by logical replication slots
+        * are removed, the slots are unusable.  This setting prevents the
+        * invalidation of slots during the upgrade. We set this option when
+        * cluster is PG17 or later because logical replication slots
can only be
+        * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+        */
+       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+               appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

2) Because we are able to override max_slot_wal_keep_size there is a
chance of slot getting invalidated and Assert being hit:
+               /*
+                * The logical replication slots shouldn't be invalidated as
+                * max_slot_wal_keep_size GUC is set to -1 during the
upgrade.
+                *
+                * The following is just a sanity check.
+                */
+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       Assert(max_slot_wal_keep_size_mb == -1);
+                       elog(ERROR, "replication slots must not be
invalidated during the upgrade");
+               }

Hmm, so how about removing an assert and changing the error message more
appropriate? I still think it seldom occurs.

As this scenario can occur by overriding max_slot_wal_keep_size, it is
better to remove the Assert.

Regards,
Vignesh

#344

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#340)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Vignesh,

Thanks for revieing! New patch can be available in [1].

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the invalid_logical_replication_slots.txt
file. There is something wrong here.
+       # 2. Advance the slot test_slot2 up to the current WAL location, but
+       #        test_slot1 still has unconsumed WAL records.
+       $old_publisher->safe_psql('postgres',
+               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+       # 3. Emit a non-transactional message. test_slot2 detects the message
so
+       #        that this slot will be also reported by upcoming pg_upgrade.
+       $old_publisher->safe_psql('postgres',
+               "SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+       );

The comment was updated based on others. How do you think?

I mean if we comment or remove this statement like in the attached
patch, the test is still passing with 'The slot "test_slot2" has not
consumed the WAL yet', in this case should the test_slot2 be still
invalid as we have called pg_replication_slot_advance for test_slot2.

Regards,
Vignesh

Attachments:

test_issue.patchtext/x-patch; charset=US-ASCII; name=test_issue.patchDownload

diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
index c6be7d5d9e..49720cc66d 100644
--- a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -74,7 +74,6 @@ sub test_upgrade_from_PG17_and_later
 		'postgres', qq[
 			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 			SELECT pg_replication_slot_advance('test_slot2', NULL);
-			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 		]);
 	$old_publisher->stop;

#345

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Peter Smith (#342)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, October 20, 2023 9:50 AM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some review comments for v54-0001

Thanks for the review.

======
src/backend/replication/slot.c
1.
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) {
+ ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("replication slots must not be invalidated during the
+ upgrade"), errhint("\"max_slot_wal_keep_size\" must not be set to -1
+ during the
upgrade"));
+ }
This new error is replacing the old code:
+ Assert(max_slot_wal_keep_size_mb == -1);

Is that errhint correct? Shouldn't it say "must" instead of "must not"?

Fixed.

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

2. General formating

Some of the "]);" formatting and indenting for the multiple SQL commands is
inconsistent.

For example,

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot1',
+ 'test_decoding'); SELECT
+ pg_create_logical_replication_slot('test_slot2', 'test_decoding'); ]
+ );

versus

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a; SELECT
+ pg_replication_slot_advance('test_slot2', NULL); SELECT count(*) FROM
+ pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
+ ]);

Fixed.

~~~

3.
+# Set up some settings for the old cluster, so that we can ensures that
+initdb # will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8'); push @initdb_params,
+('--locale', 'C'); $node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);

Why would initdb not be done if these were not set? I didn't understand the
comment.

/so that we can ensures/to ensure/

The node->init() will use a previously initialized cluster if no parameter was
specified, but that cluster could be of wrong version when doing cross-version
test, so we set something to let the initdb happen.

I added some explanation in the comment.

~~~
4.
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns
+'max_wal_senders' and # 'max_connections' to the same value (10). But
+these versions considered # max_wal_senders as a subset of
+max_connections, so setting the same value # will fail. This adjustment
+will not be needed when packages for older #versions are defined.
+if ($old_publisher->pg_version->major <= 9.6) {
+$old_publisher->append_conf(  'postgresql.conf', qq[  max_wal_senders =
+5  max_connections = 10  ]); }
4a.
IMO remove the complicated comment trying to explain the problem and just
to unconditionally set the values you want.

SUGGESTION#1
# Older PG version had different rules for the inter-dependency of
'max_wal_senders' and 'max_connections', # so assign values which will work
for all PG versions.
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);

~~

As Kuroda-san mentioned, we may fix Cluster.pm later, so I kept the XXX comment
but simplify it based on your suggestion.

Attach the new version patch.

Best Regards,
Hou zj

Attachments:

v55-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v55-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 41b0ac3fcf66805f7bf88d2114222c74f0595eec Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v55] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  65 ++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  42 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 281 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1004 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..0296c3f89d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24b712aa66..65591fac9a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in 'fast_forward' mode. In passing set the
+	 * 'processing_required' flag to indicate, were it not for this mode,
+	 * processing *would* have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,66 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	/*
+	 * Create our decoding context in fast_forward mode, passing start_lsn as
+	 * InvalidXLogRecPtr, so that we start processing from the slot's
+	 * confirmed_flush.
+	 */
+	ctx = CreateDecodingContext(InvalidXLogRecPtr,
+								NIL,
+								true,	/* fast_forward */
+								XL_ROUTINE(.page_read = read_local_xlog_page,
+										   .segment_open = wal_segment_open,
+										   .segment_close = wal_segment_close),
+								NULL, NULL, NULL);
+
+	/*
+	 * Start reading at the slot's restart_lsn, which we know points to a
+	 * valid record.
+	 */
+	XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+	/* Invalidate non-timetravel entries */
+	InvalidateSystemCaches();
+
+	/* Loop until the end of WAL or some changes are processed */
+	while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+	{
+		XLogRecord *record;
+		char	   *errm = NULL;
+
+		record = XLogReadRecord(ctx->reader, &errm);
+
+		if (errm)
+			elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+		if (record != NULL)
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		has_pending_wal = ctx->processing_required;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* Clean up */
+	FreeDecodingContext(ctx);
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..99823df3c7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..697e23f815 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,44 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* Quick exit if the input is NULL */
+	if (PG_ARGISNULL(0))
+		PG_RETURN_BOOL(false);
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..fdf1f0a0c4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problem slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..481f586a2f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..5d7f11fb09
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,281 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+	]);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+	]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	]);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher =
+  PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# To prevent node->init() from using a previously initialized cluster that
+# could be of a different version, it is essential to configure specific
+# settings for the old cluster. This can ensure that initdb will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale',   'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+	'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d',         $old_publisher->data_dir,
+	'-D',         $new_publisher->data_dir,
+	'-b',         $oldbindir,
+	'-B',         $newbindir,
+	'-s',         $new_publisher->host,
+	'-p',         $old_publisher->port,
+	'-P',         $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..0699596888 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc..8c3f20dcae 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.30.0.windows.2

#346

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: vignesh C (#344)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Friday, October 20, 2023 11:24 AM vignesh C <vignesh21@gmail.com> wrote:

On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Vignesh,

Thanks for revieing! New patch can be available in [1].

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the
invalid_logical_replication_slots.txt
file. There is something wrong here.
+ # 2. Advance the slot test_slot2 up to the current WAL location,

but
+       #        test_slot1 still has unconsumed WAL records.
+       $old_publisher->safe_psql('postgres',
+               "SELECT pg_replication_slot_advance('test_slot2',
+ NULL);");
+
+       # 3. Emit a non-transactional message. test_slot2 detects
+ the message
so
+       #        that this slot will be also reported by upcoming
pg_upgrade.
+       $old_publisher->safe_psql('postgres',
+               "SELECT count(*) FROM
+ pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+       );
The comment was updated based on others. How do you think?
I mean if we comment or remove this statement like in the attached patch, the
test is still passing with 'The slot "test_slot2" has not consumed the WAL yet', in
this case should the test_slot2 be still invalid as we have called
pg_replication_slot_advance for test_slot2.

It's because we pass NULL to pg_replication_slot_advance(). We should pass
pg_current_wal_lsn() instead. I have fixed it in V55 version.

Best Regards,
Hou zj

#347

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#345)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Attach the new version patch.

Thanks. Here are some comments on v55 patch:

1. A nit:
+
+    /*
+     * We also skip decoding in 'fast_forward' mode. In passing set the
+     * 'processing_required' flag to indicate, were it not for this mode,
+     * processing *would* have been required.
+     */
How about "We also skip decoding in fast_forward mode. In passing set
the processing_required flag to indicate that if it were not for
fast_forward mode, processing would have been required."?

2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?

+ /* Clean up */
+ FreeDecodingContext(ctx);

3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
all timetravel entries with InvalidateSystemCaches(), no?

4. The following assertion better be an error? Or we ensure that
binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
at all?
+
+    /* Slots must be valid as otherwise we won't be able to scan the WAL */
+    Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

5. This better be an error instead of returning false? IMO, null value
for slot name is an error.
+    /* Quick exit if the input is NULL */
+    if (PG_ARGISNULL(0))
+        PG_RETURN_BOOL(false);

6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;

7. Can the following pg_fatal message be consistent and start with
lowercase letter something like "expected 0 logical replication slots
...."?
+        pg_fatal("Expected 0 logical replication slots but found %d.",
+                 nslots_on_new);

8. s/problem/problematic - "A list of problematic slots is in the file:\n"
+ "A list of the problem slots is in the file:\n"

9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
better, meaningful and consistent despite a bit long than just
binary_upgrade_slot_has_caught_up.

10. How about an assert that the passed-in replication slot is logical
in binary_upgrade_slot_has_caught_up?

11. How about adding CheckLogicalDecodingRequirements too in
binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
case?

12. Not necessary but adding ReplicationSlotValidateName(slot_name,
ERROR); for the passed-in slotname in
binary_upgrade_slot_has_caught_up may be a good idea, at least in
assert builds to help with input validations.

13. Can the functionality of LogicalReplicationSlotHasPendingWal be
moved to binary_upgrade_slot_has_caught_up and get rid of a separate
function LogicalReplicationSlotHasPendingWal? Or is it that the
function exists in logical.c to avoid extra dependencies between
logical.c and pg_upgrade_support.c?

14. I think it's better to check if the old cluster contains the
necessary function binary_upgrade_slot_has_caught_up instead of just
relying on major version.
+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+        return;

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#348

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Bharath Rupireddy (#347)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath,

Thank you for reviewing! PSA new version.

1. A nit:
+
+    /*
+     * We also skip decoding in 'fast_forward' mode. In passing set the
+     * 'processing_required' flag to indicate, were it not for this mode,
+     * processing *would* have been required.
+     */
How about "We also skip decoding in fast_forward mode. In passing set
the processing_required flag to indicate that if it were not for
fast_forward mode, processing would have been required."?

Fixed.

2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?

+ /* Clean up */
+ FreeDecodingContext(ctx);

Right. Older system caches should be thrown away here for upcoming pg_dump.

3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
all timetravel entries with InvalidateSystemCaches(), no?

Added.

4. The following assertion better be an error? Or we ensure that
binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
at all?
+
+    /* Slots must be valid as otherwise we won't be able to scan the WAL */
+    Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

I kept the Assert() because pg_upgrade won't call this function for invalidated
slots.

5. This better be an error instead of returning false? IMO, null value
for slot name is an error.
+    /* Quick exit if the input is NULL */
+    if (PG_ARGISNULL(0))
+        PG_RETURN_BOOL(false);

Hmm, OK, changed to elog(ERROR).
If current style is kept and NULL were to input, an empty string may be reported
as slotname in invalid_logical_replication_slots.txt. It is quite strange. Note
again that it won't be expected.

6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;

I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.

7. Can the following pg_fatal message be consistent and start with
lowercase letter something like "expected 0 logical replication slots
...."?
+        pg_fatal("Expected 0 logical replication slots but found %d.",
+                 nslots_on_new);

Note that the Upper/Lower case rule has been broken in this file. Lower case was
used here because I regarded this sentence as hint message. Please see previous
posts [1]/messages/by-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com [2]/messages/by-id/TYAPR01MB58666936A0DB0EEDCC929CEEF5FEA@TYAPR01MB5866.jpnprd01.prod.outlook.com.

8. s/problem/problematic - "A list of problematic slots is in the file:\n"
+ "A list of the problem slots is in the file:\n"

Fixed.

9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
better, meaningful and consistent despite a bit long than just
binary_upgrade_slot_has_caught_up.

Fixed.

10. How about an assert that the passed-in replication slot is logical
in binary_upgrade_slot_has_caught_up?

Fixed.

11. How about adding CheckLogicalDecodingRequirements too in
binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
case?

Not added. CheckLogicalDecodingRequirements() ensures that WALs can be decodable
and the changes can be applied, but both of them are not needed for fast_forward
mode. Also, pre-existing function pg_logical_replication_slot_advance() does not
call it.

12. Not necessary but adding ReplicationSlotValidateName(slot_name,
ERROR); for the passed-in slotname in
binary_upgrade_slot_has_caught_up may be a good idea, at least in
assert builds to help with input validations.

Not added because ReplicationSlotAcquire() can report even if invalid name is
added. Also, pre-existing function pg_logical_replication_slot_advance() does not
call it.

13. Can the functionality of LogicalReplicationSlotHasPendingWal be
moved to binary_upgrade_slot_has_caught_up and get rid of a separate
function LogicalReplicationSlotHasPendingWal? Or is it that the
function exists in logical.c to avoid extra dependencies between
logical.c and pg_upgrade_support.c?

I kept current style. I think upgrade functions should be short so that actual
tasks should be done in other place. SetAttrMissing() is called only from an
upgrading function, so we do not have a policy to avoid deviding function.
Also, LogicalDecodingProcessRecord() is called from only files in src/backend/replication,
so we can keep them.

14. I think it's better to check if the old cluster contains the
necessary function binary_upgrade_slot_has_caught_up instead of just
relying on major version.
+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+        return;

I kept current style because I could not find a merit for the approach. If the
patch is committed PG17.X surely has binary_upgrade_logical_replication_slot_has_caught_up().
Also, other upgrading function are not checked from the pg_proc catalog. If you
have some other things in your mind, please reply here.

[1]: /messages/by-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYAPR01MB58666936A0DB0EEDCC929CEEF5FEA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v56-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v56-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From a4ca3824cf0426ce112db3c3c1623764be94acc8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v56] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  77 +++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  44 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 169 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 281 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1018 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..0296c3f89d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24b712aa66..c2b4976dfa 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in 'fast_forward' mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in fast_forward mode. In passing set the
+	 * processing_required flag to indicate that if it were not for
+	 * fast_forward mode, processing would have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..7ac0d7ba85 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,78 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	LogicalDecodingContext *ctx;
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	PG_TRY();
+	{
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as InvalidXLogRecPtr, so that we start processing from the slot's
+		 * confirmed_flush.
+		 */
+		ctx = CreateDecodingContext(InvalidXLogRecPtr,
+									NIL,
+									true,	/* fast_forward */
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+		/*
+		 * Start reading at the slot's restart_lsn, which we know points to a
+		 * valid record.
+		 */
+		XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+		/* Invalidate non-timetravel entries */
+		InvalidateSystemCaches();
+
+		/* Loop until the end of WAL or some changes are processed */
+		while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+
+			if (errm)
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			has_pending_wal = ctx->processing_required;
+
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* Clean up */
+		FreeDecodingContext(ctx);
+		InvalidateSystemCaches();
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..99823df3c7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..403edd9251 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,46 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_logical_replication_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* We must check before dereferencing the argument */
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	Assert(SlotIsLogical(MyReplicationSlot));
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..9b9f493ae5 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,152 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				/* No need to check this slot, seek to new one */
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problematic slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..5df7bc6f26 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_logical_replication_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_logical_replication_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..5d7f11fb09
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,281 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+	]);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+	]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	]);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher =
+  PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# To prevent node->init() from using a previously initialized cluster that
+# could be of a different version, it is essential to configure specific
+# settings for the old cluster. This can ensure that initdb will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale',   'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+	'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d',         $old_publisher->data_dir,
+	'-D',         $new_publisher->data_dir,
+	'-b',         $oldbindir,
+	'-B',         $newbindir,
+	'-s',         $new_publisher->host,
+	'-p',         $old_publisher->port,
+	'-P',         $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..d87dabebb9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_logical_replication_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_logical_replication_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..f8258d7c28 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in 'fast_forward' mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc..8c3f20dcae 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#349

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#348)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.

6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;
I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.

Right. It's not the txn, but the change. processing_required seems too
generic IMV. A nit: is_change_decodable or something?

Thanks for the patch. Here are few comments on v56 patch:

1.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.

This comment isn't required IMV, because anyone looking at the code
and callsites can understand it.

2. A nit: IMV "This is a special purpose ..." statement seems redundant.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.

How about

Verify that the given replication slot has consumed all the WAL changes.
If there's any decodable WAL record after the slot's
confirmed_flush_lsn, the slot's consumer will lose that data after the
slot is upgraded.
Returns true if there are no decodable WAL records after the
confirmed_flush_lsn. Otherwise false.

3.
+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

I can see the above style is referenced from
binary_upgrade_create_empty_extension, but IMV the following looks
better and latest (ereport is new style than elog)

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot name cannot be null")));

4. The following comment seems frivolous, the code tells it all.
Please remove the comment.
+
+                /* No need to check this slot, seek to new one */
+                continue;

5. A typo - s/gets/Gets
+ * gets the LogicalSlotInfos for all the logical replication slots of the

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
static int slot_count = 0;
static bool first_time = true;

if (first_time)
{
for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

first_time = false;
}

return slot_count;
}

7. A typo: s/slotname/slot name. "slot name" looks better in user
visible messages.
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",

8.
+else
+{
+    test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+        @pg_upgrade_cmd);
+}
Will this ever be tested in current TAP test framework? I mean, will
the TAP test framework allow testing upgrades from one PG version to
another PG version?

9. A nit: Can single quotes around variable names in the comments be
removed just to be consistent?
+     * We also skip decoding in 'fast_forward' mode. This check must be last
+    /* Do we need to process any change in 'fast_forward' mode? */

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#350

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#349)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.
6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;
I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.
Right. It's not the txn, but the change. processing_required seems too
generic IMV. A nit: is_change_decodable or something?

If we don't want to keep it generic then we should use something like
'contains_decodable_change'. 'is_change_decodable' could have suited
here if we were checking a particular change.

Thanks for the patch. Here are few comments on v56 patch:
1.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
This comment isn't required IMV, because anyone looking at the code
and callsites can understand it.
2. A nit: IMV "This is a special purpose ..." statement seems redundant.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
How about

Verify that the given replication slot has consumed all the WAL changes.
If there's any decodable WAL record after the slot's
confirmed_flush_lsn, the slot's consumer will lose that data after the
slot is upgraded.
Returns true if there are no decodable WAL records after the
confirmed_flush_lsn. Otherwise false.

Personally, I find the current comment succinct and clear.

3.
+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");
I can see the above style is referenced from
binary_upgrade_create_empty_extension, but IMV the following looks
better and latest (ereport is new style than elog)

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot name cannot be null")));

Do you have any theory for making elog to ereport? I am not completely
sure but as this and related function is used internally, so using
elog seems reasonable. Also, I find keeping it consistent with the
existing error message is also reasonable. We can change both later
together if we get a broader agreement.

4. The following comment seems frivolous, the code tells it all.
Please remove the comment.
+
+                /* No need to check this slot, seek to new one */
+                continue;
5. A typo - s/gets/Gets
+ * gets the LogicalSlotInfos for all the logical replication slots of the

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
static int slot_count = 0;
static bool first_time = true;

if (first_time)
{
for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

first_time = false;
}

return slot_count;
}

This may not be a problem but this is also not a function that will be
used frequently. I am not sure if adding such code optimizations is
worth it.

7. A typo: s/slotname/slot name. "slot name" looks better in user
visible messages.
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",

If we want to follow other parameters then we can even use slot_name.

--
With Regards,
Amit Kapila.

#351

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#347)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Sat, Oct 21, 2023 at 5:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
better, meaningful and consistent despite a bit long than just
binary_upgrade_slot_has_caught_up.

I think logical_replication is specific to our pub-sub model but we
can have manually created slots as well. So, it would be better to
name it as binary_upgrade_logical_slot_has_caught_up().

--
With Regards,
Amit Kapila.

#352

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#350)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath, Amit,

Thanks for reviewing! PSA new version.
I addressed comments which have not been claimed.

On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Thank you for reviewing! PSA new version.
6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;
I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.
Right. It's not the txn, but the change. processing_required seems too
generic IMV. A nit: is_change_decodable or something?
If we don't want to keep it generic then we should use something like
'contains_decodable_change'. 'is_change_decodable' could have suited
here if we were checking a particular change.

I kept the name for now. How does Bharath think?

Thanks for the patch. Here are few comments on v56 patch:
1.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
This comment isn't required IMV, because anyone looking at the code
and callsites can understand it.

Removed.

2. A nit: IMV "This is a special purpose ..." statement seems redundant.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
How about

Verify that the given replication slot has consumed all the WAL changes.
If there's any decodable WAL record after the slot's
confirmed_flush_lsn, the slot's consumer will lose that data after the
slot is upgraded.
Returns true if there are no decodable WAL records after the
confirmed_flush_lsn. Otherwise false.
Personally, I find the current comment succinct and clear.

I kept current one.

3.
+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");
I can see the above style is referenced from
binary_upgrade_create_empty_extension, but IMV the following looks
better and latest (ereport is new style than elog)

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot name cannot be null")));
Do you have any theory for making elog to ereport? I am not completely
sure but as this and related function is used internally, so using
elog seems reasonable. Also, I find keeping it consistent with the
existing error message is also reasonable. We can change both later
together if we get a broader agreement.

I kept current style. elog() was used here because I regarded it as
"cannot happen" error. According to the doc [1]https://www.postgresql.org/docs/devel/error-message-reporting.html, elog() is still used
for the purpose.

4. The following comment seems frivolous, the code tells it all.
Please remove the comment.
+
+                /* No need to check this slot, seek to new one */
+                continue;

Removed.

5. A typo - s/gets/Gets
+ * gets the LogicalSlotInfos for all the logical replication slots of the

Replaced.

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
static int slot_count = 0;
static bool first_time = true;

if (first_time)
{
for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

first_time = false;
}

return slot_count;
}

This may not be a problem but this is also not a function that will be
used frequently. I am not sure if adding such code optimizations is
worth it.

Not addressed.

7. A typo: s/slotname/slot name. "slot name" looks better in user
visible messages.
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",

two_phase: %s",

If we want to follow other parameters then we can even use slot_name.

Changed to slot_name.

Below part is replies for remained comments:

8.
+else
+{
+    test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+        @pg_upgrade_cmd);
+}
Will this ever be tested in current TAP test framework? I mean, will
the TAP test framework allow testing upgrades from one PG version to
another PG version?

Yes, the TAP tester allow to do cross-version upgrade. According to
src/bin/pg_upgrade/TESTING file:

```
Testing an upgrade from a different PG version is also possible, and
provides a more thorough test that pg_upgrade does what it's meant for.
```

Below commands are an example of the test.

```
# test PG9.5 -> patched HEAD
$ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. ok
All tests successful.
Files=1, Tests=3, 11 wallclock secs ( 0.03 usr 0.01 sys + 2.78 cusr 1.08 csys = 3.90 CPU)
Result: PASS

# grep the output and find an evidence that cross-version check was done
$ cat tmp_check/log/regress_log_003_upgrade_logical_replication_slots | grep 'check the slot does not exist on new cluster'
[05:14:22.322](0.139s) ok 3 - check the slot does not exist on new cluster

```

9. A nit: Can single quotes around variable names in the comments be
removed just to be consistent?
+     * We also skip decoding in 'fast_forward' mode. This check must be last
+    /* Do we need to process any change in 'fast_forward' mode? */

Removed.

Also, based on a comment [2]/messages/by-id/CAA4eK1+YZP3j1H4ChhzSR23k6MPryW-cgGstyvqbek2CMJoHRA@mail.gmail.com, the upgrade function was renamed to
'binary_upgrade_logical_slot_has_caught_up'.

[1]: https://www.postgresql.org/docs/devel/error-message-reporting.html
[2]: /messages/by-id/CAA4eK1+YZP3j1H4ChhzSR23k6MPryW-cgGstyvqbek2CMJoHRA@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v57-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v57-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From 658be0e6f67a2bb43e55b59c6aeae132da328c2f Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v57] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  75 +++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  44 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 168 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 281 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1015 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..0296c3f89d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24b712aa66..1237118e84 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in fast_forward mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in fast_forward mode. In passing set the
+	 * processing_required flag to indicate that if it were not for
+	 * fast_forward mode, processing would have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..8288da5277 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,76 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	PG_TRY();
+	{
+		LogicalDecodingContext *ctx;
+
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as InvalidXLogRecPtr, so that we start processing from the slot's
+		 * confirmed_flush.
+		 */
+		ctx = CreateDecodingContext(InvalidXLogRecPtr,
+									NIL,
+									true,	/* fast_forward */
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+		/*
+		 * Start reading at the slot's restart_lsn, which we know points to a
+		 * valid record.
+		 */
+		XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+		/* Invalidate non-timetravel entries */
+		InvalidateSystemCaches();
+
+		/* Loop until the end of WAL or some changes are processed */
+		while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+
+			if (errm)
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			has_pending_wal = ctx->processing_required;
+
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* Clean up */
+		FreeDecodingContext(ctx);
+		InvalidateSystemCaches();
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..99823df3c7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..2f6fc86c3d 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,46 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* We must check before dereferencing the argument */
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	Assert(SlotIsLogical(MyReplicationSlot));
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..179f85ae8a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,151 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problematic slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..7f21d26fd2 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_logical_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_logical_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..5d7f11fb09
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,281 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+	]);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+	]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	]);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher =
+  PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# To prevent node->init() from using a previously initialized cluster that
+# could be of a different version, it is essential to configure specific
+# settings for the old cluster. This can ensure that initdb will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale',   'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+	'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d',         $old_publisher->data_dir,
+	'-D',         $new_publisher->data_dir,
+	'-b',         $oldbindir,
+	'-B',         $newbindir,
+	'-s',         $new_publisher->host,
+	'-p',         $old_publisher->port,
+	'-P',         $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..06435e8b92 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..dffc0d1564 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in fast_forward mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc..8c3f20dcae 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#353

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#352)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 24, 2023 at 11:32 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

If we don't want to keep it generic then we should use something like
'contains_decodable_change'. 'is_change_decodable' could have suited
here if we were checking a particular change.

I kept the name for now. How does Bharath think?

No more bikeshedding from my side. +1 for processing_required as-is.

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
static int slot_count = 0;
static bool first_time = true;

if (first_time)
{
for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

first_time = false;
}

return slot_count;
}

This may not be a problem but this is also not a function that will be
used frequently. I am not sure if adding such code optimizations is
worth it.

Not addressed.

count_old_cluster_logical_slots is being called 3 times during
pg_upgrade and every time counting number of slots for all the
databases seems redundant IMV especially given the fact that the slot
count is computed once at the beginning and never changes. When the
replication slots on the cluster are on the higher side, every time
counting *may* prove costly. And, the use of static variables isn't a
huge change requiring a different set of infra or as such, it's a
simple pattern.

Having said above, if others don't see a merit in it, I'm okay to
withdraw my comment.

Below commands are an example of the test.

```
# test PG9.5 -> patched HEAD
$ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'

Oh, I get it. Thanks.

Also, based on a comment [2], the upgrade function was renamed to
'binary_upgrade_logical_slot_has_caught_up'.

+1.

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#354

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#353)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test. Shall we
remove it for now and then consider it separately?

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.

--
With Regards,
Amit Kapila.

Attachments:

v58-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v58-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From b9ca30ca8f400e27a0d49a06a7ea31cac64b6fc7 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v58] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++-
 src/backend/replication/logical/decode.c      |  48 ++-
 src/backend/replication/logical/logical.c     |  75 +++++
 src/backend/replication/slot.c                |  14 +
 src/backend/utils/adt/pg_upgrade_support.c    |  44 +++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 168 ++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 +-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 +-
 .../003_upgrade_logical_replication_slots.pl  | 281 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 1015 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..b7fe740fde 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24b712aa66..1237118e84 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in fast_forward mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in fast_forward mode. In passing set the
+	 * processing_required flag to indicate that if it were not for
+	 * fast_forward mode, processing would have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..8288da5277 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,76 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	PG_TRY();
+	{
+		LogicalDecodingContext *ctx;
+
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as InvalidXLogRecPtr, so that we start processing from the slot's
+		 * confirmed_flush.
+		 */
+		ctx = CreateDecodingContext(InvalidXLogRecPtr,
+									NIL,
+									true,	/* fast_forward */
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+		/*
+		 * Start reading at the slot's restart_lsn, which we know points to a
+		 * valid record.
+		 */
+		XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+		/* Invalidate non-timetravel entries */
+		InvalidateSystemCaches();
+
+		/* Loop until the end of WAL or some changes are processed */
+		while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+
+			if (errm)
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			has_pending_wal = ctx->processing_required;
+
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* Clean up */
+		FreeDecodingContext(ctx);
+		InvalidateSystemCaches();
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..99823df3c7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..2f6fc86c3d 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,46 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* We must check before dereferencing the argument */
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	Assert(SlotIsLogical(MyReplicationSlot));
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..179f85ae8a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,151 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problematic slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..7f21d26fd2 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_logical_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_logical_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..5d7f11fb09
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,281 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Verify that logical replication slots can be migrated.  This function will
+# be executed when the old cluster is PG17 and later.
+sub test_upgrade_from_PG17_and_later
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+	# Preparations for the subsequent test:
+	# 1. Create two slots on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+		SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+	]);
+	$old_publisher->stop();
+
+	# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+	#	 present on the old cluster.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 1");
+
+	# pg_upgrade will fail because the new cluster has insufficient
+	# max_replication_slots
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+		],
+		[qr//],
+		'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+	);
+	ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+	# Set 'max_replication_slots' to match the number of slots (2) present
+	# on the old cluster. Both slots will be used for subsequent tests.
+	$new_publisher->append_conf('postgresql.conf',
+		"max_replication_slots = 2");
+
+
+	# ------------------------------
+	# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL
+	# records
+
+	# Preparations for the subsequent test:
+	# 1. Generate extra WAL records. At this point neither test_slot1 nor
+	#	 test_slot2 has consumed them.
+	#
+	# 2. Advance the slot test_slot2 up to the current WAL location, but
+	#	 test_slot1 still has unconsumed WAL records.
+	#
+	# 3. Emit a non-transactional message. This will cause test_slot2 to detect
+	#	 the unconsumed WAL record.
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+			CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+			SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+			SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+	]);
+	$old_publisher->stop;
+
+	# pg_upgrade will fail because there are slots still having unconsumed WAL
+	# records
+	command_checks_all(
+		[@pg_upgrade_cmd],
+		1,
+		[
+			qr/Your installation contains logical replication slots that can't be upgraded./
+		],
+		[qr//],
+		'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+	);
+
+	# Verify the reason why the logical replication slot cannot be upgraded
+	my $slots_filename;
+
+	# Find a txt file that contains a list of logical replication slots that
+	# cannot be upgraded. We cannot predict the file's path because the output
+	# directory contains a milliseconds timestamp. File::Find::find must be
+	# used.
+	find(
+		sub {
+			if ($File::Find::name =~
+				m/invalid_logical_replication_slots\.txt/)
+			{
+				$slots_filename = $File::Find::name;
+			}
+		},
+		$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+	# Check the file content. Both slots should be reporting that they have
+	# unconsumed WAL records.
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+	like(
+		slurp_file($slots_filename),
+		qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+		'the previous test failed due to unconsumed WALs');
+
+
+	# ------------------------------
+	# TEST: Successful upgrade
+
+	# Preparations for the subsequent test:
+	# 1. Setup logical replication (first, cleanup slots from the previous
+	#	 tests)
+	my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+	$old_publisher->start;
+	$old_publisher->safe_psql(
+		'postgres', qq[
+		SELECT * FROM pg_drop_replication_slot('test_slot1');
+		SELECT * FROM pg_drop_replication_slot('test_slot2');
+		CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	]);
+
+	# Initialize subscriber cluster
+	my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+	$subscriber->init();
+
+	$subscriber->start;
+	$subscriber->safe_psql(
+		'postgres', qq[
+		CREATE TABLE tbl (a int);
+		CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+	]);
+	$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+	# 2. Temporarily disable the subscription
+	$subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION regress_sub DISABLE");
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+	# Check that the slot 'regress_sub' has migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT slot_name, two_phase FROM pg_replication_slots");
+	is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+	# Update the connection
+	my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+	$subscriber->safe_psql(
+		'postgres', qq[
+		ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+		ALTER SUBSCRIPTION regress_sub ENABLE;
+	]);
+
+	# Check whether changes on the new publisher get replicated to the
+	# subscriber
+	$new_publisher->safe_psql('postgres',
+		"INSERT INTO tbl VALUES (generate_series(11, 20))");
+	$new_publisher->wait_for_catchup('regress_sub');
+	$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+	is($result, qq(20), 'check changes are replicated to the subscriber');
+
+	# Clean up
+	$subscriber->stop();
+	$new_publisher->stop();
+}
+
+# Verify that logical replication slots cannot be migrated.  This function will
+# be executed when the old cluster version is prior to PG17.
+sub test_upgrade_from_pre_PG17
+{
+	my ($old_publisher, $new_publisher, @pg_upgrade_cmd) = @_;
+
+	# ------------------------------
+	# TEST: Confirm logical replication slots cannot be migrated
+
+	# Preparations for the subsequent test:
+	# 1. Create a slot on the old cluster
+	$old_publisher->start;
+	$old_publisher->safe_psql('postgres',
+		"SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');"
+	);
+	$old_publisher->stop;
+
+	# pg_upgrade should be successful, but any logical replication slots will
+	# be not migrated.
+	command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+	ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+		"pg_upgrade_output.d/ removed after pg_upgrade success");
+
+	# Check that the slot 'test_slot' has not migrated to the new cluster
+	$new_publisher->start;
+	my $result = $new_publisher->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check the slot does not exist on new cluster');
+
+	# Clean up
+	$new_publisher->stop();
+}
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster. Cross-version checks are also supported.
+my $old_publisher =
+  PostgreSQL::Test::Cluster->new('old_publisher',
+	install_path => $ENV{oldinstall});
+
+my %node_params = ();
+$node_params{allows_streaming} = 'logical';
+
+# To prevent node->init() from using a previously initialized cluster that
+# could be of a different version, it is essential to configure specific
+# settings for the old cluster. This can ensure that initdb will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale',   'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);
+
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+	'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+my $oldbindir = $old_publisher->config_data('--bindir');
+my $newbindir = $new_publisher->config_data('--bindir');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d',         $old_publisher->data_dir,
+	'-D',         $new_publisher->data_dir,
+	'-b',         $oldbindir,
+	'-B',         $newbindir,
+	'-s',         $new_publisher->host,
+	'-p',         $old_publisher->port,
+	'-P',         $new_publisher->port,
+	$mode);
+
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots from versions older than PG17 is not
+# supported.
+if ($old_publisher->pg_version->major >= 17)
+{
+	test_upgrade_from_PG17_and_later($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+else
+{
+	test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+		@pg_upgrade_cmd);
+}
+
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..06435e8b92 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..dffc0d1564 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in fast_forward mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc..8c3f20dcae 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.28.0.windows.1

#355

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#354)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Based on your advice, I revised the patch again.

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test.

I analyzed and agreed that current BF client does not use TAP test framework
for cross-version checks.

Shall we
remove it for now and then consider it separately?

OK, some parts for cross-checks were removed.

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.

Thanks, included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v59-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchapplication/octet-stream; name=v59-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patchDownload

From d6a192054399c4c75785e198b32226c78d6824e2 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 4 Apr 2023 05:49:34 +0000
Subject: [PATCH v59] pg_upgrade: Allow to replicate logical replication slots
 to new node

This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
fetched. At the later part of upgrading, pg_upgrade revisits the list and
restores slots by executing pg_create_logical_replication_slot() on the new
cluster. Migration of logical replication slots is only supported when the old
cluster is version 17.0 or later.

If the old node has slots with the status 'lost' or with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

Note that the pg_resetwal command would remove WAL files, which are required for
restart_lsn. If WALs required by logical replication slots are removed, the
slots are unusable. Therefore, during the upgrade, slot restoration is done
after the final pg_resetwal command. The workflow ensures that required WALs are
retained.

The significant advantage of this commit is that it makes it easy to continue
logical replication even after upgrading the publisher node. Previously,
pg_upgrade allowed copying publications to a new node. With this new commit,
adjusting the connection string to the new publisher will cause the apply
worker on the subscriber to connect to the new publisher automatically. This
enables seamless continuation of logical replication, even after an upgrade.

Author: Hayato Kuroda
Co-authored-by: Hou Zhijie
Reviewed-by: Peter Smith, Julien Rouhaud, Vignesh C, Wang Wei, Masahiko Sawada,
             Dilip Kumar, Bharath Rupireddy, Shlok Kyal
---
 doc/src/sgml/ref/pgupgrade.sgml               |  78 ++++++-
 src/backend/replication/logical/decode.c      |  48 ++++-
 src/backend/replication/logical/logical.c     |  75 +++++++
 src/backend/replication/slot.c                |  14 ++
 src/backend/utils/adt/pg_upgrade_support.c    |  44 ++++
 src/bin/pg_upgrade/Makefile                   |   3 +
 src/bin/pg_upgrade/check.c                    | 168 ++++++++++++++-
 src/bin/pg_upgrade/function.c                 |  30 ++-
 src/bin/pg_upgrade/info.c                     | 166 ++++++++++++++-
 src/bin/pg_upgrade/meson.build                |   1 +
 src/bin/pg_upgrade/pg_upgrade.c               |  74 ++++++-
 src/bin/pg_upgrade/pg_upgrade.h               |  22 +-
 src/bin/pg_upgrade/server.c                   |  25 ++-
 .../003_upgrade_logical_replication_slots.pl  | 192 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/replication/logical.h             |   5 +
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 926 insertions(+), 26 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 608193b307..b7fe740fde 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical replication slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
       </para>
      </step>
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24b712aa66..1237118e84 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
@@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;
 
+	/*
+	 * We also skip decoding in fast_forward mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in fast_forward mode. In passing set the
+	 * processing_required flag to indicate that if it were not for
+	 * fast_forward mode, processing would have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 41243d0187..8288da5277 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"
 
 /* data for errcontext callback */
@@ -1949,3 +1951,76 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	PG_TRY();
+	{
+		LogicalDecodingContext *ctx;
+
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as InvalidXLogRecPtr, so that we start processing from the slot's
+		 * confirmed_flush.
+		 */
+		ctx = CreateDecodingContext(InvalidXLogRecPtr,
+									NIL,
+									true,	/* fast_forward */
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+		/*
+		 * Start reading at the slot's restart_lsn, which we know points to a
+		 * valid record.
+		 */
+		XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+		/* Invalidate non-timetravel entries */
+		InvalidateSystemCaches();
+
+		/* Loop until the end of WAL or some changes are processed */
+		while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+
+			if (errm)
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			has_pending_wal = ctx->processing_required;
+
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* Clean up */
+		FreeDecodingContext(ctx);
+		InvalidateSystemCaches();
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	return has_pending_wal;
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7e5ec500d8..99823df3c7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 
 		SpinLockRelease(&s->mutex);
 
+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..2f6fc86c3d 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -261,3 +262,46 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* We must check before dereferencing the argument */
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	Assert(SlotIsLogical(MyReplicationSlot));
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 5834513add..05e9299654 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32
 
+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..179f85ae8a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);
 
 
 /*
@@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);
 
 	init_tablespaces();
 
@@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 
 	check_new_cluster_is_empty();
 
@@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);
 
 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }
 
 
@@ -1451,3 +1465,151 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problematic slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index dc8800c7cd..5af936bd45 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
  * get_loadable_libraries()
  *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
  *	We will later check that they all exist in the new installation.
  */
 void
@@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;
 
 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}
 
-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;
 
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
 
 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}
 
 	pg_free(ress);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index aa5faca4d6..7f21d26fd2 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);
 
 
 /*
@@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }
 
 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
  *
  * higher level routine to generate dbinfos for the database running
  * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
  */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;
 
@@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}
 
 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }
 
+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_logical_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_logical_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}
 
 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)
 
 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }
 
@@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 12a97f84e2..2c4f38d865 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,6 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..3960af4036 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();
 
+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);
 
 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }
 
 /*
@@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..ba8129d135 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;
 
+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
  * The following structure represents a relation mapping.
  */
@@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;
 
 /*
@@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);
 
 /* option.c */
 
diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 0bc3d2806b..d7f6c268ef 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;
 
 	static bool exit_hook_registered = false;
 
@@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif
 
+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
 
+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
new file mode 100644
index 0000000000..5e416f553d
--- /dev/null
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@@ -0,0 +1,192 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d', $old_publisher->data_dir,
+	'-D', $new_publisher->data_dir,
+	'-b', $old_publisher->config_data('--bindir'),
+	'-B', $new_publisher->config_data('--bindir'),
+	'-s', $new_publisher->host,
+	'-p', $old_publisher->port,
+	'-P', $new_publisher->port,
+	$mode);
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+]);
+$old_publisher->stop();
+
+# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+#	 present on the old cluster.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# pg_upgrade will fail because the new cluster has insufficient
+# max_replication_slots
+command_checks_all(
+	[@pg_upgrade_cmd],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Set 'max_replication_slots' to match the number of slots (2) present on the
+# old cluster. Both slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 2");
+
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. At this point neither test_slot1 nor
+#	 test_slot2 has consumed them.
+#
+# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot1
+#	 still has unconsumed WAL records.
+#
+# 3. Emit a non-transactional message. This will cause test_slot2 to detect the
+#	 unconsumed WAL record.
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+]);
+$old_publisher->stop;
+
+# pg_upgrade will fail because there are slots still having unconsumed WAL
+# records
+command_checks_all(
+	[@pg_upgrade_cmd],
+	1,
+	[
+		qr/Your installation contains logical replication slots that can't be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+);
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_replication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# Check the file content. Both slots should be reporting that they have
+# unconsumed WAL records.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication (first, cleanup slots from the previous tests)
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT * FROM pg_drop_replication_slot('test_slot1');
+	SELECT * FROM pg_drop_replication_slot('test_slot2');
+	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+]);
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init();
+
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# pg_upgrade should be successful
+command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..06435e8b92 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,11 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
 
 # conversion functions
 { oid => '4302',
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5f49554ea0..dffc0d1564 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in fast_forward mode? */
+	bool		processing_required;
 } LogicalDecodingContext;
 
 
@@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc..8c3f20dcae 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1503,6 +1503,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
-- 
2.27.0

#356

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Amit Kapila (#354)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test. Shall we
remove it for now and then consider it separately?

I think we can retain the test_upgrade_from_pre_PG17 because it is not
only possible to trigger it manually but also one can write a CI
workflow to trigger it.

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.

+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>

+ The new cluster must not have permanent logical replication slots, i.e.,

How about using "logical slots" in place of "logical replication
slots" to be more generic? We agreed and changed the function name to

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#357

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#356)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Oct 25, 2023 at 1:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test. Shall we
remove it for now and then consider it separately?

I think we can retain the test_upgrade_from_pre_PG17 because it is not
only possible to trigger it manually but also one can write a CI
workflow to trigger it.

It would be better to gauge its value separately and add it once the
main patch is committed. I am slightly unhappy even with the hack used
for pre-version testing in previous patch which is as follows:
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+ 'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);

There should be a way to avoid this but we can decide it afterwards. I
don't want to hold the main patch for this point. What do you think?

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>
+ The new cluster must not have permanent logical replication slots, i.e.,

How about using "logical slots" in place of "logical replication
slots" to be more generic? We agreed and changed the function name to

Yeah, I am fine with that and I can take care of it before committing
unless there is more to change.

--
With Regards,
Amit Kapila.

#358

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Amit Kapila (#357)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Oct 25, 2023 at 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It would be better to gauge its value separately and add it once the
main patch is committed.
There should be a way to avoid this but we can decide it afterwards. I
don't want to hold the main patch for this point. What do you think?

+1 to go with the main patch first. We also have another thing to take
care of - pg_upgrade option to not migrate logical slots.

How about using "logical slots" in place of "logical replication
slots" to be more generic? We agreed and changed the function name to

Yeah, I am fine with that and I can take care of it before committing
unless there is more to change.

+1. I have no other comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#359

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#357)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Hi,

The BF animal fairywren[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54 failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots
# could not open file "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt": No such file or directory
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
and share fix soon.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

Best Regards,
Hou zj

#360

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#359)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots
# could not open file "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt": No such file or directory
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

Nice catch. Windows docs say that the file/directory path name can't
exceed MAX_PATH, which is defined as 260 characters. However, one must
opt-in to enable longer path names -
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
and https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
and share fix soon.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
In fact, we've used "logical slots" instead of "logical replication
slots" in the docs to be generic. By looking at the generated
directory path name, I think we can use shorter node names - instead
of old_publisher, new_publisher, subscriber - either use node1 (for
old publisher), node2 (for subscriber), node3 (for new publisher) or
use alpha (for old publisher), bravo (for subscriber), charlie (for
new publisher) or such shorter names. We don't have to be that
descriptive and long in node names, one can look at the test file to
know which one is what.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#361

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#360)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots
# could not open file "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt": No such file or directory
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

Nice catch. Windows docs say that the file/directory path name can't
exceed MAX_PATH, which is defined as 260 characters. However, one must
opt-in to enable longer path names -
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
and https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
and share fix soon.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
In fact, we've used "logical slots" instead of "logical replication
slots" in the docs to be generic. By looking at the generated
directory path name, I think we can use shorter node names - instead
of old_publisher, new_publisher, subscriber - either use node1 (for
old publisher), node2 (for subscriber), node3 (for new publisher) or
use alpha (for old publisher), bravo (for subscriber), charlie (for
new publisher) or such shorter names. We don't have to be that
descriptive and long in node names, one can look at the test file to
know which one is what.

Some more ideas for shortening the filename:

1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
"upgrade" is redundant in that filename (earlier patches never had
this). The test file lives under "pg_upgrade/t" so I felt that
upgrading is already implied.

2. If the node names will be shortened they should still retain *some*
meaning if possible:
old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
nothing without studying the tests)
old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
nothing without studying the tests)
How about:
old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
or similar...

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#362

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Peter Smith (#361)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots
# could not open file "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt": No such file or directory
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

Nice catch. Windows docs say that the file/directory path name can't
exceed MAX_PATH, which is defined as 260 characters. However, one must
opt-in to enable longer path names -
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
and https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
and share fix soon.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.

+1. The proposed file name sounds reasonable.

In fact, we've used "logical slots" instead of "logical replication
slots" in the docs to be generic. By looking at the generated
directory path name, I think we can use shorter node names - instead
of old_publisher, new_publisher, subscriber - either use node1 (for
old publisher), node2 (for subscriber), node3 (for new publisher) or
use alpha (for old publisher), bravo (for subscriber), charlie (for
new publisher) or such shorter names. We don't have to be that
descriptive and long in node names, one can look at the test file to
know which one is what.

Some more ideas for shortening the filename:

1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
"upgrade" is redundant in that filename (earlier patches never had
this). The test file lives under "pg_upgrade/t" so I felt that
upgrading is already implied.

Agreed. So, how about 003_upgrade_logical_slots.pl or simply
003_upgrade_slots.pl?

2. If the node names will be shortened they should still retain *some*
meaning if possible:
old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
nothing without studying the tests)
old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
nothing without studying the tests)
How about:
old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
or similar...

Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

--
With Regards,
Amit Kapila.

#363

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Amit Kapila (#362)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.

+1. The proposed file name sounds reasonable.

Agreed. So, how about 003_upgrade_logical_slots.pl or simply
003_upgrade_slots.pl?

Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

+1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
oldpub/sub/newpub. With these changes, the path name is brought down
to ~220 chars. These names look good to me iff other things in the
path name aren't dynamic crossing MAX_PATH limit (260 chars).

C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#364

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#359)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Hou,

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

Good catch!

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

Yeah, Bharath has already reported, I agreed that the reason was [1]https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry.

```
In the Windows API (with some exceptions discussed in the following paragraphs),
the maximum length for a path is MAX_PATH, which is defined as 260 characters.
```

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze
more
and share fix soon.

Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.

```
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
```

[1]: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-Shorten-some-files.patchapplication/octet-stream; name=0001-Shorten-some-files.patchDownload

From 704e1944e17591ccccc0f9116f89829b37516ba8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 27 Oct 2023 03:08:23 +0000
Subject: [PATCH] Shorten some files

---
 src/bin/pg_upgrade/check.c                    |  2 +-
 ...lication_slots.pl => 003_logical_slots.pl} | 86 +++++++++----------
 2 files changed, 44 insertions(+), 44 deletions(-)
 rename src/bin/pg_upgrade/t/{003_upgrade_logical_replication_slots.pl => 003_logical_slots.pl} (70%)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 179f85ae8a..fa52aa2c22 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1554,7 +1554,7 @@ check_old_cluster_for_valid_slots(bool live_check)
 
 	snprintf(output_path, sizeof(output_path), "%s/%s",
 			 log_opts.basedir,
-			 "invalid_logical_replication_slots.txt");
+			 "invalid_logical_slots.txt");
 
 	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
 	{
diff --git a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
similarity index 70%
rename from src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
rename to src/bin/pg_upgrade/t/003_logical_slots.pl
index 5e416f553d..af9f350431 100644
--- a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -15,23 +15,23 @@ use Test::More;
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 
 # Initialize old cluster
-my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
-$old_publisher->init(allows_streaming => 'logical');
+my $oldpub = PostgreSQL::Test::Cluster->new('oldpub');
+$oldpub->init(allows_streaming => 'logical');
 
 # Initialize new cluster
-my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
-$new_publisher->init(allows_streaming => 'logical');
+my $newpub = PostgreSQL::Test::Cluster->new('newpub');
+$newpub->init(allows_streaming => 'logical');
 
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
 my @pg_upgrade_cmd = (
 	'pg_upgrade', '--no-sync',
-	'-d', $old_publisher->data_dir,
-	'-D', $new_publisher->data_dir,
-	'-b', $old_publisher->config_data('--bindir'),
-	'-B', $new_publisher->config_data('--bindir'),
-	'-s', $new_publisher->host,
-	'-p', $old_publisher->port,
-	'-P', $new_publisher->port,
+	'-d', $oldpub->data_dir,
+	'-D', $newpub->data_dir,
+	'-b', $oldpub->config_data('--bindir'),
+	'-B', $newpub->config_data('--bindir'),
+	'-s', $newpub->host,
+	'-p', $oldpub->port,
+	'-P', $newpub->port,
 	$mode);
 
 # ------------------------------
@@ -39,17 +39,17 @@ my @pg_upgrade_cmd = (
 
 # Preparations for the subsequent test:
 # 1. Create two slots on the old cluster
-$old_publisher->start;
-$old_publisher->safe_psql(
+$oldpub->start;
+$oldpub->safe_psql(
 	'postgres', qq[
 	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
 	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
 ]);
-$old_publisher->stop();
+$oldpub->stop();
 
 # 2. Set 'max_replication_slots' to be less than the number of slots (2)
 #	 present on the old cluster.
-$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+$newpub->append_conf('postgresql.conf', "max_replication_slots = 1");
 
 # pg_upgrade will fail because the new cluster has insufficient
 # max_replication_slots
@@ -62,12 +62,12 @@ command_checks_all(
 	[qr//],
 	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
 );
-ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ok( -d $newpub->data_dir . "/pg_upgrade_output.d",
 	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
 
 # Set 'max_replication_slots' to match the number of slots (2) present on the
 # old cluster. Both slots will be used for subsequent tests.
-$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 2");
+$newpub->append_conf('postgresql.conf', "max_replication_slots = 2");
 
 
 # ------------------------------
@@ -82,14 +82,14 @@ $new_publisher->append_conf('postgresql.conf', "max_replication_slots = 2");
 #
 # 3. Emit a non-transactional message. This will cause test_slot2 to detect the
 #	 unconsumed WAL record.
-$old_publisher->start;
-$old_publisher->safe_psql(
+$oldpub->start;
+$oldpub->safe_psql(
 	'postgres', qq[
 		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
 		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
 		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 ]);
-$old_publisher->stop;
+$oldpub->stop;
 
 # pg_upgrade will fail because there are slots still having unconsumed WAL
 # records
@@ -111,12 +111,12 @@ my $slots_filename;
 # contains a milliseconds timestamp. File::Find::find must be used.
 find(
 	sub {
-		if ($File::Find::name =~ m/invalid_logical_replication_slots\.txt/)
+		if ($File::Find::name =~ m/invalid_logical_slots\.txt/)
 		{
 			$slots_filename = $File::Find::name;
 		}
 	},
-	$new_publisher->data_dir . "/pg_upgrade_output.d");
+	$newpub->data_dir . "/pg_upgrade_output.d");
 
 # Check the file content. Both slots should be reporting that they have
 # unconsumed WAL records.
@@ -135,10 +135,10 @@ like(
 
 # Preparations for the subsequent test:
 # 1. Setup logical replication (first, cleanup slots from the previous tests)
-my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+my $old_connstr = $oldpub->connstr . ' dbname=postgres';
 
-$old_publisher->start;
-$old_publisher->safe_psql(
+$oldpub->start;
+$oldpub->safe_psql(
 	'postgres', qq[
 	SELECT * FROM pg_drop_replication_slot('test_slot1');
 	SELECT * FROM pg_drop_replication_slot('test_slot2');
@@ -146,47 +146,47 @@ $old_publisher->safe_psql(
 ]);
 
 # Initialize subscriber cluster
-my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
-$subscriber->init();
+my $sub = PostgreSQL::Test::Cluster->new('sub');
+$sub->init();
 
-$subscriber->start;
-$subscriber->safe_psql(
+$sub->start;
+$sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
 	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
-$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
 # 2. Temporarily disable the subscription
-$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
-$old_publisher->stop;
+$sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$oldpub->stop;
 
 # pg_upgrade should be successful
 command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 
 # Check that the slot 'regress_sub' has migrated to the new cluster
-$new_publisher->start;
-my $result = $new_publisher->safe_psql('postgres',
+$newpub->start;
+my $result = $newpub->safe_psql('postgres',
 	"SELECT slot_name, two_phase FROM pg_replication_slots");
 is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
-my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
-$subscriber->safe_psql(
+my $new_connstr = $newpub->connstr . ' dbname=postgres';
+$sub->safe_psql(
 	'postgres', qq[
 	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
 	ALTER SUBSCRIPTION regress_sub ENABLE;
 ]);
 
 # Check whether changes on the new publisher get replicated to the subscriber
-$new_publisher->safe_psql('postgres',
+$newpub->safe_psql('postgres',
 	"INSERT INTO tbl VALUES (generate_series(11, 20))");
-$new_publisher->wait_for_catchup('regress_sub');
-$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
-is($result, qq(20), 'check changes are replicated to the subscriber');
+$newpub->wait_for_catchup('regress_sub');
+$result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the sub');
 
 # Clean up
-$subscriber->stop();
-$new_publisher->stop();
+$sub->stop();
+$newpub->stop();
 
 done_testing();
-- 
2.27.0

#365

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Bharath Rupireddy (#363)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath, Amit, Peter,

Thank you for discussing! A patch can be available in [1]/messages/by-id/TYCPR01MB5870A6A8FBB23554EDE8F5F3F5DCA@TYCPR01MB5870.jpnprd01.prod.outlook.com.

+1 for

s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl

and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.

+1. The proposed file name sounds reasonable.

Agreed. So, how about 003_upgrade_logical_slots.pl or simply
003_upgrade_slots.pl?

Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

+1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
oldpub/sub/newpub. With these changes, the path name is brought down
to ~220 chars. These names look good to me iff other things in the
path name aren't dynamic crossing MAX_PATH limit (260 chars).

C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgra
de/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/
pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

Replaced to invalid_logical_slots.txt, 003_logical_slots.pl, and oldpub/sub/newpub.
Regarding the test finename, some client app (e.g., pg_ctl) does not have a prefix,
and some others (e.g., pg_dump) have. Either way seems acceptable.
Hence I chose to remove the header.

```
$ ls pg_ctl/t/
001_start_stop.pl 002_status.pl 003_promote.pl 004_logrotate.pl

$ ls pg_dump/t/
001_basic.pl 002_pg_dump.pl 003_pg_dump_with_server.pl 004_pg_dump_parallel.pl 010_dump_connstr.pl
```

[1]: /messages/by-id/TYCPR01MB5870A6A8FBB23554EDE8F5F3F5DCA@TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#366

Michael Paquier

michael@paquier.xyz

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#364)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 04:40:43AM +0000, Hayato Kuroda (Fujitsu) wrote:

Yeah, Bharath has already reported, I agreed that the reason was [1].

```
In the Windows API (with some exceptions discussed in the following paragraphs),
the maximum length for a path is MAX_PATH, which is defined as 260 characters.
```

-                        "invalid_logical_replication_slots.txt");
+                        "invalid_logical_slots.txt");

Or you could do something even shorter, with "invalid_slots.txt".
--
Michael

#367

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Michael Paquier (#366)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Oct 27, 2023 at 04:40:43AM +0000, Hayato Kuroda (Fujitsu) wrote:

Yeah, Bharath has already reported, I agreed that the reason was [1].

```
In the Windows API (with some exceptions discussed in the following paragraphs),
the maximum length for a path is MAX_PATH, which is defined as 260 characters.
```
-                        "invalid_logical_replication_slots.txt");
+                        "invalid_logical_slots.txt");
Or you could do something even shorter, with "invalid_slots.txt".

I also thought of it but if we want to keep it that way, we should
slightly adjust the messages like: "The slot \"%s\" is invalid" to
include slot_type. This will contain only logical slots, so the
current one probably seems okay.

--
With Regards,
Amit Kapila.

#368

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Michael Paquier (#366)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Michael,

Or you could do something even shorter, with "invalid_slots.txt".

I think current one seems better, because we only support logical replication
slots for now. We can extend as you said when we support physical slot as well.
Also, proposed length is sufficient for fairywren [1]/messages/by-id/CALj2ACVc-WSx_fvfynt-G3j8rjhNTMZ8DHu2wiKgCEiV9EO86g@mail.gmail.com.

[1]: /messages/by-id/CALj2ACVc-WSx_fvfynt-G3j8rjhNTMZ8DHu2wiKgCEiV9EO86g@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#369

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Amit Kapila (#367)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier <michael@paquier.xyz> wrote:
-                        "invalid_logical_replication_slots.txt");
+                        "invalid_logical_slots.txt");
Or you could do something even shorter, with "invalid_slots.txt".
I also thought of it but if we want to keep it that way, we should
slightly adjust the messages like: "The slot \"%s\" is invalid" to
include slot_type. This will contain only logical slots, so the
current one probably seems okay.

+1 for invalid_logical_slots.txt as file name (which can fix Windows
path name issue) and contents as-is "The slot \"%s\" is invalid\n" and
"The slot \"%s\" has not consumed the WAL yet\n".

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#370

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#364)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.
```
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
```

The patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#371

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#370)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Fri, Oct 27, 2023 at 11:24 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.
```
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
```
The patch LGTM.

Thanks, I'll push it in some time.

--
With Regards,
Amit Kapila.

#372

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#371)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

I found several machines on BF got angry (e.g. [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-10-27%2006%3A08%3A31), because of missing update meson.build. Sorry for that.
PSA the patch to fix it.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-10-27%2006%3A08%3A31

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

fix_meson.patchapplication/octet-stream; name=fix_meson.patchDownload

diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 2c4f38d865..3e8a08e062 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -42,7 +42,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
-      't/003_upgrade_logical_replication_slots.pl',
+      't/003_logical_slots.pl',
     ],
     'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
   },

#373

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#372)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

PSA the patch to solve the issue [1]/messages/by-id/7b8a9460-5668-b372-04e6-7b52e9308493@dunslane.net.

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]: /messages/by-id/7b8a9460-5668-b372-04e6-7b52e9308493@dunslane.net

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

change_dir.patchapplication/octet-stream; name=change_dir.patchDownload

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index af9f350431..5b01cf8c40 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -34,6 +34,11 @@ my @pg_upgrade_cmd = (
 	'-P', $newpub->port,
 	$mode);
 
+# In a VPATH build, we'll be started in the source directory, but we want
+# to run pg_upgrade in the build directory so that any files generated finish
+# in it, like delete_old_cluster.{sh,bat}.
+chdir ${PostgreSQL::Test::Utils::tmp_check};
+
 # ------------------------------
 # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values

#374

Peter Smith

smithpb2250@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#373)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 7, 2023 at 3:14 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

Hi Kuroda-san,

Thanks for the patch.

I reproduced the bug, then after applying your patch, I confirmed the
problem is fixed. I used the VPATH build

~~~

BEFORE
t/001_basic.pl .......... ok
t/002_pg_upgrade.pl ..... ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.05 usr 0.01 sys + 12.90
cusr 7.43 csys = 20.39 CPU)
Result: PASS

OBSERVE THE BUG
Look in the source folder and notice the file that should not be there.

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
delete_old_cluster.sh

~~~

AFTER
# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl .......... ok
t/002_pg_upgrade.pl ..... ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.06 usr 0.01 sys + 13.02
cusr 7.28 csys = 20.37 CPU)
Result: PASS

CONFIRM THE FIX
Check the offending file is no longer in the src folder

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
ls: cannot access *.sh: No such file or directory

Instead, it is found in the VPATH folder
[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/vpath_dir/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls tmp_check/
delete_old_cluster.sh log results

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#375

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#373)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]:
/messages/by-id/7b8a9460-5668-b372-04e6-7b
52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Best regards,
Hou zj

#376

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Zhijie Hou (Fujitsu) (#375)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]:
/messages/by-id/7b8a9460-5668-b372-04e6-7b
52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Thanks for the patch and verification. Pushed the fix.

--
With Regards,
Amit Kapila.

#377

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Amit Kapila (#376)

2 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, 7 Nov 2023 at 13:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]:
/messages/by-id/7b8a9460-5668-b372-04e6-7b
52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Thanks for the patch and verification. Pushed the fix.

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126 ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1 0x000055555556f572 in dopr (target=0x7fffffffbb90,
format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:444
#2 0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:195
#3 0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffffffdc40) at util.c:184
#4 0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5 0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
at info.c:813
#6 0x000055555556186e in print_db_infos (db_arr=0x555555587518
<new_cluster+120>) at info.c:782
#7 0x00005555555606da in get_db_rel_and_slot_infos
(cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
#8 0x000055555555839a in check_new_cluster () at check.c:215
#9 0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

We could fix it by a couple of ways: a) Initialize the whole of
dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
that the slot information is set to 0. b) Setting only slot
information. Attached patch has the changes for both the approaches.
Thoughts?

Regards,
Vignesh

Attachments:

Upgrade_verbose_issue_fix.patchtext/x-patch; charset=US-ASCII; name=Upgrade_verbose_issue_fix.patchDownload

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..d2a1815fef 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -408,7 +408,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -640,11 +640,7 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
 
 	/* Logical slots can be migrated since PG17. */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
-	{
-		dbinfo->slot_arr.slots = slotinfos;
-		dbinfo->slot_arr.nslots = num_slots;
 		return;
-	}
 
 	conn = connectToServer(&old_cluster, dbinfo->db_name);

Upgrade_verbose_issue_alternate_fix.patchtext/x-patch; charset=US-ASCII; name=Upgrade_verbose_issue_alternate_fix.patchDownload

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..21a0b0551a 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -297,6 +297,11 @@ get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 		 */
 		if (cluster == &old_cluster)
 			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+		else
+		{
+			pDbInfo->slot_arr.slots = NULL;
+			pDbInfo->slot_arr.nslots = 0;
+		}
 	}
 
 	if (cluster == &old_cluster)

#378

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: vignesh C (#377)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 8, 2023 at 8:44 AM vignesh C <vignesh21@gmail.com> wrote:

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126 ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1 0x000055555556f572 in dopr (target=0x7fffffffbb90,
format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:444
#2 0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:195
#3 0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffffffdc40) at util.c:184
#4 0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5 0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
at info.c:813
#6 0x000055555556186e in print_db_infos (db_arr=0x555555587518
<new_cluster+120>) at info.c:782
#7 0x00005555555606da in get_db_rel_and_slot_infos
(cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
#8 0x000055555555839a in check_new_cluster () at check.c:215
#9 0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

Thanks for the report. I'll review your proposed fix.

--
With Regards,
Amit Kapila.

#379

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: vignesh C (#377)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 8, 2023 at 8:44 AM vignesh C <vignesh21@gmail.com> wrote:

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126 ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1 0x000055555556f572 in dopr (target=0x7fffffffbb90,
format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:444
#2 0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:195
#3 0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffffffdc40) at util.c:184
#4 0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5 0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
at info.c:813
#6 0x000055555556186e in print_db_infos (db_arr=0x555555587518
<new_cluster+120>) at info.c:782
#7 0x00005555555606da in get_db_rel_and_slot_infos
(cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
#8 0x000055555555839a in check_new_cluster () at check.c:215
#9 0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

We could fix it by a couple of ways: a) Initialize the whole of
dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
that the slot information is set to 0.

I would prefer this fix instead of initializing the slot array at
multiple places. I'll push this tomorrow unless someone thinks
otherwise.

--
With Regards,
Amit Kapila.

#380

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: vignesh C (#377)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:

On Tue, 7 Nov 2023 at 13:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田隼人 <kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]:
/messages/by-id/7b8a9460-5668-b372-04e6-7b
52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Thanks for the patch and verification. Pushed the fix.

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126 ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1 0x000055555556f572 in dopr (target=0x7fffffffbb90,
format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:444
#2 0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:195
#3 0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffffffdc40) at util.c:184
#4 0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5 0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
at info.c:813
#6 0x000055555556186e in print_db_infos (db_arr=0x555555587518
<new_cluster+120>) at info.c:782
#7 0x00005555555606da in get_db_rel_and_slot_infos
(cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
#8 0x000055555555839a in check_new_cluster () at check.c:215
#9 0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

We could fix it by a couple of ways: a) Initialize the whole of
dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
that the slot information is set to 0. b) Setting only slot
information. Attached patch has the changes for both the approaches.
Thoughts?

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Regards,
Vignesh

Attachments:

Upgrade_verbose_issue_fix_v2.patchtext/x-patch; charset=US-ASCII; name=Upgrade_verbose_issue_fix_v2.patchDownload

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 7f21d26fd2..4878aa22bf 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -408,7 +408,7 @@ get_db_infos(ClusterInfo *cluster)
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
-	dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
+	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
 	for (tupnum = 0; tupnum < ntups; tupnum++)
 	{
@@ -636,15 +636,11 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
 	PGconn	   *conn;
 	PGresult   *res;
 	LogicalSlotInfo *slotinfos = NULL;
-	int			num_slots = 0;
+	int			num_slots;
 
 	/* Logical slots can be migrated since PG17. */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
-	{
-		dbinfo->slot_arr.slots = slotinfos;
-		dbinfo->slot_arr.nslots = num_slots;
 		return;
-	}
 
 	conn = connectToServer(&old_cluster, dbinfo->db_name);

#381

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: vignesh C (#380)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Pushed!

--
With Regards,
Amit Kapila.

#382

John Naylor

johncnaylorls@gmail.com

about 2 years ago

In reply to: Amit Kapila (#381)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

#383

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: John Naylor (#382)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 22, 2023 at 1:30 PM John Naylor <johncnaylorls@gmail.com> wrote:

On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

No. I have marked it as committed.

--
With Regards,
Amit Kapila.

#384

Masahiko Sawada

sawada.mshk@gmail.com

about 2 years ago

In reply to: Amit Kapila (#381)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Nov 9, 2023 at 7:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Pushed!

Thank you for your work on this feature!

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

---
{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#385

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 2 years ago

In reply to: Masahiko Sawada (#384)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

It may happen that such a user-facing function tells there's no
unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
Therefore, the information the function gives turns out to be
incorrect. I don't see a real-world use-case for such a function right
now. If there's one, it's not a big change to turn it into a
user-facing function.

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

If that were true, I don't see a problem in having
CheckSlotPermissions() there, in fact it can act as an assertion.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

---
{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Why not in logicalfuncs.c?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#386

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Bharath Rupireddy (#385)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

It may happen that such a user-facing function tells there's no
unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
Therefore, the information the function gives turns out to be
incorrect. I don't see a real-world use-case for such a function right
now. If there's one, it's not a big change to turn it into a
user-facing function.

Yeah, as of now, I don't see a use case for it and in fact, it could
lead to unpredictable results. Immediately after calling the function,
there could be more activity on the server which could make the
results incorrect. I think to check the slot's upgradeability, one can
rely on the results of the pg_upgrade --check functionality.

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

If that were true, I don't see a problem in having
CheckSlotPermissions() there, in fact it can act as an assertion.

I think we can change it to assertion or may elog(ERROR, ...) with a
comment as to why we don't expect this can happen.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

This should be fixed.

---
{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Why not in logicalfuncs.c?

I am not sure if either of those is better than logical.c. IIRC, I
thought it was okay to keep in logical.c as others primarily deal with
exposed SQL functions and I felt it somewhat matches with the intent
of logical.c ("The goal is to encapsulate most of the internal
complexity for consumers of logical decoding, so they can create and
consume a changestream with a low amount of code..").

--
With Regards,
Amit Kapila.

#387

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Bharath Rupireddy (#385)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Bharath, Sawada-san,

Welcome back!

---
{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
either true or false is returned.

But if proisstrict is changed true, we must handle the case when NULL is returned.
It is small but backseat operation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#388

Masahiko Sawada

sawada.mshk@gmail.com

about 2 years ago

In reply to: Amit Kapila (#386)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 28, 2023 at 6:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

It may happen that such a user-facing function tells there's no
unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
Therefore, the information the function gives turns out to be
incorrect. I don't see a real-world use-case for such a function right
now. If there's one, it's not a big change to turn it into a
user-facing function.

Yeah, as of now, I don't see a use case for it and in fact, it could
lead to unpredictable results. Immediately after calling the function,
there could be more activity on the server which could make the
results incorrect. I think to check the slot's upgradeability, one can
rely on the results of the pg_upgrade --check functionality.

Fair point.

This function is already a user-executable function as it's in
pg_catalog but is restricted to be executed only in binary upgrade
even though it doesn't change anything internally. So it wasn't clear
to me why we put such a restriction.

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

If that were true, I don't see a problem in having
CheckSlotPermissions() there, in fact it can act as an assertion.

I think we can change it to assertion or may elog(ERROR, ...) with a
comment as to why we don't expect this can happen.

+1 for an assertion, to match other checks in the function.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

This should be fixed.

---
{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

binary_upgrade_create_empty_extension() needs to be a non-strict
function since it needs to accept NULL in some arguments such as
extConfig. On the other hand,
binary_upgrade_logical_slot_has_caught_up() doesn't handle NULL and
it's conventional to make such a function a strict function.

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Why not in logicalfuncs.c?

I am not sure if either of those is better than logical.c. IIRC, I
thought it was okay to keep in logical.c as others primarily deal with
exposed SQL functions and I felt it somewhat matches with the intent
of logical.c ("The goal is to encapsulate most of the internal
complexity for consumers of logical decoding, so they can create and
consume a changestream with a low amount of code..").

I see your point. To me it looks that the functions in logical.c are
APIs and internal functions to manage logical decoding context and
replication slot (e.g., restart_lsn). On the other hand,
LogicalReplicationSlotHasPendingWal() seems to be a user of the
logical decoding. But anyway, it seems that three hackers have
different opinions. So we can keep it unless someone has a good reason
to change it.

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
either true or false is returned.

But if proisstrict is changed true, we must handle the case when NULL is returned.
It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#389

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Masahiko Sawada (#388)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Sawada-san,

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we

assumed

either true or false is returned.

But if proisstrict is changed true, we must handle the case when NULL is

returned.

It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Hmm. I checked the C99 specification [1]https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

[1]: https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#390

Masahiko Sawada

sawada.mshk@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#389)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, Nov 28, 2023 at 10:58 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Sawada-san,

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we

assumed

either true or false is returned.

But if proisstrict is changed true, we must handle the case when NULL is

returned.

It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#391

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Masahiko Sawada (#390)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Sawada-san,

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Oh, you are right... I found below paragraph from [1]https://www.postgresql.org/docs/devel/libpq-exec.html#LIBPQ-PQGETVALUE.

An empty string is returned if the field value is null. See PQgetisnull to distinguish
null values from empty-string values.

So I agree what you said - current code can accept NULL.
But still not sure the error message is really good or not.
If we regard an empty string as false, the slot which has empty name will be reported like:
"The slot \"\" has not consumed the WAL yet" in check_old_cluster_for_valid_slots().
Isn't it inappropriate?

(Note again - currently we do not find such a case, so it may be overkill)

[1]: https://www.postgresql.org/docs/devel/libpq-exec.html#LIBPQ-PQGETVALUE

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#392

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#383)

2 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

No. I have marked it as committed.

I found another failure related with the commit [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-11-27%2020%3A52%3A10. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]``` ... Checking for contrib/isn with bigint-passing mismatch ok Checking for valid logical replication slots fatal. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.

How do you think?

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-11-27%2020%3A52%3A10
[2]: ``` ... Checking for contrib/isn with bigint-passing mismatch ok Checking for valid logical replication slots fatal
```
...
Checking for contrib/isn with bigint-passing mismatch ok
Checking for valid logical replication slots fatal

Your installation contains logical replication slots that can't be upgraded.
You can remove invalid slots and/or consume the pending WAL for other slots,
and then restart the upgrade.
A list of the problematic slots is in the file:
/home/bf/bf-build/skink-master/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231127T220024.480/invalid_logical_slots.txt
Failure, exiting
[22:01:20.362](86.645s) not ok 10 - run of pg_upgrade of old cluster
...
```
[3]: ``` ... 2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG: released logical replication slot "regress_sub" 2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG: received fast shutdown request 2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG: aborting any active transactions *2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL: terminating autovacuum process due to administrator command* 2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG: background worker "logical replication launcher" (PID 3560674) exited with exit code 1 ... ```
```
...
2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG: released logical replication slot "regress_sub"
2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG: received fast shutdown request
2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG: aborting any active transactions
*2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL: terminating autovacuum process due to administrator command*
2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG: background worker "logical replication launcher" (PID 3560674) exited with exit code 1
...
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

disable_autovacuum.patchapplication/octet-stream; name=disable_autovacuum.patchDownload

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 5b01cf8c40..087a4cd6e8 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -17,6 +17,7 @@ my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 # Initialize old cluster
 my $oldpub = PostgreSQL::Test::Cluster->new('oldpub');
 $oldpub->init(allows_streaming => 'logical');
+$oldpub->append_conf('postgresql.conf', 'autovacuum = off');
 
 # Initialize new cluster
 my $newpub = PostgreSQL::Test::Cluster->new('newpub');

reproduce.txttext/plain; name=reproduce.txtDownload

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 86a3b3d8be..406c588a1d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -662,7 +662,7 @@ AutoVacLauncherMain(int argc, char *argv[])
 		 */
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+						 100L,
 						 WAIT_EVENT_AUTOVACUUM_MAIN);
 
 		ResetLatch(MyLatch);
@@ -769,6 +769,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 		}
 		LWLockRelease(AutovacuumLock);	/* either shared or exclusive */
 
+		/* force launch */
+		can_launch = true;
+
 		/* if we can't do anything, just go back to sleep */
 		if (!can_launch)
 			continue;
@@ -1267,38 +1270,6 @@ do_start_worker(void)
 		if (!tmp->adw_entry)
 			continue;
 
-		/*
-		 * Also, skip a database that appears on the database list as having
-		 * been processed recently (less than autovacuum_naptime seconds ago).
-		 * We do this so that we don't select a database which we just
-		 * selected, but that pgstat hasn't gotten around to updating the last
-		 * autovacuum time yet.
-		 */
-		skipit = false;
-
-		dlist_reverse_foreach(iter, &DatabaseList)
-		{
-			avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, iter.cur);
-
-			if (dbp->adl_datid == tmp->adw_datid)
-			{
-				/*
-				 * Skip this database if its next_worker value falls between
-				 * the current time and the current time plus naptime.
-				 */
-				if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
-												current_time, 0) &&
-					!TimestampDifferenceExceeds(current_time,
-												dbp->adl_next_worker,
-												autovacuum_naptime * 1000))
-					skipit = true;
-
-				break;
-			}
-		}
-		if (skipit)
-			continue;
-
 		/*
 		 * Remember the db with oldest autovac time.  (If we are here, both
 		 * tmp->entry and db->entry must be non-null.)
@@ -3198,6 +3169,9 @@ relation_needs_vacanalyze(Oid relid,
 	/* ANALYZE refuses to work with pg_statistic */
 	if (relid == StatisticRelationId)
 		*doanalyze = false;
+
+	*dovacuum = true;
+	*doanalyze = true;
 }
 
 /*
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 5b01cf8c40..5c181375a4 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -17,6 +17,7 @@ my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
 # Initialize old cluster
 my $oldpub = PostgreSQL::Test::Cluster->new('oldpub');
 $oldpub->init(allows_streaming => 'logical');
+$oldpub->append_conf('postgresql.conf', 'autovacuum_naptime = 3');
 
 # Initialize new cluster
 my $newpub = PostgreSQL::Test::Cluster->new('newpub');
@@ -164,6 +165,7 @@ $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+sleep 4;
 $oldpub->stop;
 
 # pg_upgrade should be successful

#393

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#392)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

No. I have marked it as committed.

I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.

Agreed, for now, we should change the test as you proposed. I'll take
care of that. However, I wonder, if we should also ensure that
autovacuum or any other worker is shut down before walsender processes
the last set of WAL before shutdown. We can analyze more on this and
probably start a separate thread to discuss this point.

--
With Regards,
Amit Kapila.

#394

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Amit Kapila (#393)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Nov 30, 2023 at 8:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

No. I have marked it as committed.

I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.

Agreed, for now, we should change the test as you proposed. I'll take
care of that. However, I wonder, if we should also ensure that
autovacuum or any other worker is shut down before walsender processes
the last set of WAL before shutdown. We can analyze more on this and
probably start a separate thread to discuss this point.

Sorry, my analysis was not complete. On looking closely, I think the
reason is that we are allowed to upgrade the slot iff there is no
pending WAL to be processed. The test first disables the subscription
to avoid unnecessary LOGs on the subscriber and then stops the
publisher node. It is quite possible that just before the shutdown of
the server, autovacuum generates some WAL record that needs to be
processed, so you propose just disabling the autovacuum for this test.

--
With Regards,
Amit Kapila.

#395

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#394)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Amit,

Sorry, my analysis was not complete. On looking closely, I think the
reason is that we are allowed to upgrade the slot iff there is no
pending WAL to be processed.

Yes, the guard will strongly protect from data loss, but I do not take care in the test.

The test first disables the subscription
to avoid unnecessary LOGs on the subscriber and then stops the
publisher node.

Right. Unnecessary ERROR would be appeared if we do not disable.

It is quite possible that just before the shutdown of
the server, autovacuum generates some WAL record that needs to be
processed,

Yeah, pg_upgrade does not ensure that autovacuum is not running *before* the
upgrade.

so you propose just disabling the autovacuum for this test.

Absolutely correct.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#396

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#391)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Nov 29, 2023 at 7:33 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Oh, you are right... I found below paragraph from [1].

An empty string is returned if the field value is null. See PQgetisnull to distinguish
null values from empty-string values.

So I agree what you said - current code can accept NULL.
But still not sure the error message is really good or not.
If we regard an empty string as false, the slot which has empty name will be reported like:
"The slot \"\" has not consumed the WAL yet" in check_old_cluster_for_valid_slots().
Isn't it inappropriate?

I see your point that giving a better message (which would tell the
actual problem) to the user in this case also has a value. OTOH, as
you said, this case won't happen in practical scenarios, so I am fine
either way with a slight tilt toward retaining a better error message
(aka the current way). Sawada-San/Bharath, do you have any suggestions
on this?

--
With Regards,
Amit Kapila.

#397

Masahiko Sawada

sawada.mshk@gmail.com

about 2 years ago

In reply to: Amit Kapila (#396)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Thu, Nov 30, 2023 at 6:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 29, 2023 at 7:33 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Oh, you are right... I found below paragraph from [1].

An empty string is returned if the field value is null. See PQgetisnull to distinguish
null values from empty-string values.

So I agree what you said - current code can accept NULL.
But still not sure the error message is really good or not.
If we regard an empty string as false, the slot which has empty name will be reported like:
"The slot \"\" has not consumed the WAL yet" in check_old_cluster_for_valid_slots().
Isn't it inappropriate?

I see your point that giving a better message (which would tell the
actual problem) to the user in this case also has a value. OTOH, as
you said, this case won't happen in practical scenarios, so I am fine
either way with a slight tilt toward retaining a better error message
(aka the current way). Sawada-San/Bharath, do you have any suggestions
on this?

TBH I'm not sure the error message is much helpful for users more than
the message "The slot \"\" has not consumed the WAL yet" in practice.
In either case, the messages just tell the user the slot name passed
to the function was not appropriate. Rather, I'm a bit concerned that
we create a precedent that we make a function non-strict to produce an
error message only for unrealistic cases. Please point out if we
already have such precedents. Other functions in pg_upgrade_support.c
such as binary_upgrade_set_next_pg_tablespace_oid() are not called if
the argument is NULL since it's a strict function. But if null was
passed in (where should not happen in practice), pg_upgrade would fail
with an error message or would finish while leaving the cluster in an
inconsistent state, I've not tested. Why do we want to care about the
argument being NULL only in
binary_upgrade_logical_slot_has_caught_up()?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#398

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#392)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear hackers,

I found another failure related with the commit [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-12-01%2016%3A59%3A30. This is caused by missing
wait on the test code. Amit helped me for this analysis and fix.

# Analysis of the failure

The failure is that restored slot is two_phase = false, whereas the slot is
created as two_phase = true. This is because pg_upgrade was executed before all
tables are in ready state.

# How to fix

I think the test is not good. According to other subscription tests related with
2PC, they additionally wait until subtwophasestate becomes 'e'. It should be
added as well. PSA the patch.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-12-01%2016%3A59%3A30

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

add_wait.patchapplication/octet-stream; name=add_wait.patchDownload

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 087a4cd6e8..020e7aa1cc 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -163,6 +163,12 @@ $sub->safe_psql(
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
+# Also wait for two-phase to be enabled
+my $twophase_query =
+  "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');";
+$sub->poll_query_until('postgres', $twophase_query)
+  or die "Timed out while waiting for subscriber to enable twophase";
+
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
 $oldpub->stop;

#399

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#398)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Mon, Dec 4, 2023 at 11:59 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

I found another failure related with the commit [1]. This is caused by missing
wait on the test code. Amit helped me for this analysis and fix.

Pushed!

--
With Regards,
Amit Kapila.

#400

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Masahiko Sawada (#384)

1 attachment(s)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

I kept the function to be upgrade only because subsequent operations might generate
WALs. See [1]/messages/by-id/CALj2ACW7H-kAHia=vCbmdWDueGA_3pQfyzARfAQX0aGzHY57Zw@mail.gmail.com.

Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

CheckSlotPermissions() was replaced to Assert().

The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

Per below comment, this elog(ERROR) was not needed anymore. Removed.

{ oid => '8046', descr => 'for use by pg_upgrade',
proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
provolatile => 'v', proparallel => 'u', prorettype => 'bool',
proargtypes => 'name',
prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

Per conclusion [2]/messages/by-id/CAA4eK1LzK0NvMkWAY6RJ6yN+YYUgMg1f=mNOGV8CPXLT43FHMw@mail.gmail.com, I changed the function to the strict one. As shown in below,
binary_upgrade_logical_slot_has_caught_up() returned NULL when the input was NULL.

```
postgres=# SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
slot_name | lsn
-----------+-----------
slot | 0/152E7E0
(1 row)

postgres=# SELECT * FROM binary_upgrade_logical_slot_has_caught_up(NULL);
binary_upgrade_logical_slot_has_caught_up
-------------------------------------------

(1 row)
```

LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Committers had different opinions about it, so I kept current style [3]/messages/by-id/CAD21AoDkyyC=wa2=1Ruo_L8g16xf_W5Xyhp-=3j9urT916b9gA@mail.gmail.com.

[1]: /messages/by-id/CALj2ACW7H-kAHia=vCbmdWDueGA_3pQfyzARfAQX0aGzHY57Zw@mail.gmail.com
[2]: /messages/by-id/CAA4eK1LzK0NvMkWAY6RJ6yN+YYUgMg1f=mNOGV8CPXLT43FHMw@mail.gmail.com
[3]: /messages/by-id/CAD21AoDkyyC=wa2=1Ruo_L8g16xf_W5Xyhp-=3j9urT916b9gA@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

followup_for_upgrade.patchapplication/octet-stream; name=followup_for_upgrade.patchDownload

diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 2f6fc86c3d..7c6d25edf4 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -281,11 +281,7 @@ binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
 
 	CHECK_IS_BINARY_UPGRADE;
 
-	/* We must check before dereferencing the argument */
-	if (PG_ARGISNULL(0))
-		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
-
-	CheckSlotPermissions();
+	Assert(has_rolreplication(GetUserId()));
 
 	slot_name = PG_GETARG_NAME(0);
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb58dee3bc..77e8b13764 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11392,9 +11392,8 @@
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
 { oid => '8046', descr => 'for use by pg_upgrade',
-  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
-  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
-  proargtypes => 'name',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', provolatile => 'v',
+  proparallel => 'u', prorettype => 'bool', proargtypes => 'name',
   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
 
 # conversion functions

#401

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#400)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

Thanks for the patch, the changes look good to me.

Regards,
Vignesh

#402

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: vignesh C (#401)

1 attachment(s)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Dec 6, 2023 at 9:40 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

Thanks for the patch, the changes look good to me.

Thanks, I have added a comment and updated the commit message. I'll
push this tomorrow unless there are more comments.

--
With Regards,
Amit Kapila.

Attachments:

v2-0001-Fix-issues-in-binary_upgrade_logical_slot_has_cau.patchapplication/octet-stream; name=v2-0001-Fix-issues-in-binary_upgrade_logical_slot_has_cau.patchDownload

From 66ab58415f0e24c0a39f1f47837e7ccb6aff5da0 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 6 Dec 2023 09:37:15 +0530
Subject: [PATCH v2] Fix issues in binary_upgrade_logical_slot_has_caught_up().

The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up()
as a non-strict function to allow providing a better error message to callers
in case the passed slot_name is NULL. On further discussion, it seems that
it is not helpful to have a different error message for NULL input in this
function, so this patch marks the function as strict.

This patch also removes the explicit permission check to use replication
slots as this function is invoked only by superusers and instead adds an
Assert.

Reported-by: Masahiko Sawada
Author: Hayato Kuroda
Reviewed-by: Vignesh C
Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
---
 src/backend/utils/adt/pg_upgrade_support.c | 10 +++++-----
 src/include/catalog/pg_proc.dat            |  5 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 2f6fc86c3d..d0beea3601 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -281,11 +281,11 @@ binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
 
 	CHECK_IS_BINARY_UPGRADE;
 
-	/* We must check before dereferencing the argument */
-	if (PG_ARGISNULL(0))
-		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
-
-	CheckSlotPermissions();
+	/*
+	* Binary upgrades only allowed super-user connections so we must have
+	* permission to use replication slots.
+	*/
+	Assert(has_rolreplication(GetUserId()));
 
 	slot_name = PG_GETARG_NAME(0);
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb58dee3bc..77e8b13764 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11392,9 +11392,8 @@
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
 { oid => '8046', descr => 'for use by pg_upgrade',
-  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
-  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
-  proargtypes => 'name',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', provolatile => 'v',
+  proparallel => 'u', prorettype => 'bool', proargtypes => 'name',
   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
 
 # conversion functions
-- 
2.28.0.windows.1

#403

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Amit Kapila (#402)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

On Wed, Dec 6, 2023 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 6, 2023 at 9:40 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

Thanks for the patch, the changes look good to me.

Thanks, I have added a comment and updated the commit message. I'll
push this tomorrow unless there are more comments.

Pushed.

--
With Regards,
Amit Kapila.

#404

Thomas Munro

thomas.munro@gmail.com

about 2 years ago

In reply to: Amit Kapila (#403)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-12-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) # Failed test 'run of pg_upgrade of old cluster'
# at C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/

#405

Alexander Lakhin

exclusion@gmail.com

about 2 years ago

In reply to: Thomas Munro (#404)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

17.12.2023 07:02, Thomas Munro wrote:

FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-12-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) # Failed test 'run of pg_upgrade of old cluster'
# at C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/

Yes, but most probably it's the same failure as
/messages/by-id/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best regards,
Alexander

#406

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Alexander Lakhin (#405)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Thomas, Alexander,

17.12.2023 07:02, Thomas Munro wrote:

FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-1
2-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of

"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgr
ade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgra
de_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"

for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) # Failed test 'run of pg_upgrade of old cluster'
# at

C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/
003_logical_slots.pl

line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/

Yes, but most probably it's the same failure as

Thanks for reporting. Yes, it has been already reported by me [1]/messages/by-id/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com, and the server
log was provided by Andrew [2]/messages/by-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA@TYAPR01MB5866.jpnprd01.prod.outlook.com. The issue was that a file creation was failed
because the same one was unlink()'d just before but it was in STATUS_DELETE_PENDING
status. Kindly Alexander proposed a fix [3]/messages/by-id/976479cf-dd66-ca19-f40c-5640e30700cb@gmail.com and it looks good to me, but
confirmations by senior and windows-friendly developers are needed to move forward.
(at first we thought the issue was solved by updating, but it was not correct)

I know that you have developed there region, so I'm very happy if you check the
forked thread.

[1]: /messages/by-id/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: /messages/by-id/976479cf-dd66-ca19-f40c-5640e30700cb@gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#407

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#406)

RE: [PoC] pg_upgrade: allow to upgrade publisher node

Dear Thomas, Alexander,

Thanks for reporting. Yes, it has been already reported by me [1], and the server
log was provided by Andrew [2]. The issue was that a file creation was failed
because the same one was unlink()'d just before but it was in
STATUS_DELETE_PENDING
status. Kindly Alexander proposed a fix [3] and it looks good to me, but
confirmations by senior and windows-friendly developers are needed to move
forward.
(at first we thought the issue was solved by updating, but it was not correct)

I know that you have developed there region, so I'm very happy if you check the
forked thread.

I forgot to say an important point. The issue was not introduced by the feature.
It just actualized a possible failure, only for Windows environment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED