Reducing connection overhead in pg_upgrade compat check phase

Started by Daniel Gustafssonalmost 3 years ago34 messages

daniel@yesql.se

almost 3 years ago

1 attachment(s)

When adding a check to pg_upgrade a while back I noticed in a profile that the
cluster compatibility check phase spend a lot of time in connectToServer. Some
of this can be attributed to data type checks which each run serially in turn
connecting to each database to run the check, and this seemed like a place
where we can do better.

The attached patch moves the checks from individual functions, which each loops
over all databases, into a struct which is consumed by a single umbrella check
where all data type queries are executed against a database using the same
connection. This way we can amortize the connectToServer overhead across more
accesses to the database.

In the trivial case, a single database, I don't see a reduction of performance
over the current approach. In a cluster with 100 (empty) databases there is a
~15% reduction in time to run a --check pass. While it won't move the earth in
terms of wallclock time, consuming less resources on the old cluster allowing
--check to be cheaper might be the bigger win.

--
Daniel Gustafsson

Attachments:

0001-pg_upgrade-run-all-data-type-checks-per-connection.patchapplication/octet-stream; name=0001-pg_upgrade-run-all-data-type-checks-per-connection.patch; x-unix-mode=0644Download

From 393f49f28ad8e1a6740e745339c790b2f2b14723 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 19 Dec 2022 23:22:07 +0100
Subject: [PATCH] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.
---
 src/bin/pg_upgrade/check.c      | 371 +++++++++++++++---------------
 src/bin/pg_upgrade/pg_upgrade.h |  28 ++-
 src/bin/pg_upgrade/version.c    | 394 ++++++++++++++------------------
 3 files changed, 373 insertions(+), 420 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 7cf68dc9af..7f52d4727c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -26,15 +26,177 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+static bool check_for_aclitem_data_type_usage(ClusterInfo *cluster);
+static bool check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static char *get_canonical_locale_name(int category, const char *locale);
 
+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{"Checking for system-defined composite types in user tables",
+	 "tables_using_composite.txt",
+
+	 "SELECT t.oid FROM pg_catalog.pg_type t "
+	 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+	 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+
+	 "Your installation contains system-defined composite type(s) in user tables.\n"
+	 "These type OIDs are not stable across PostgreSQL versions,\n"
+	 "so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:\n",
+	 NULL},
+
+	/*
+	 * 9.3 -> 9.4
+	 *	Fully implement the 'line' data type in 9.4, which previously returned
+	 *	"not enabled" by default and was only functionally enabled with a
+	 *	compile-time switch; as of 9.4 "line" has a different on-disk
+	 *	representation format.
+	 */
+	{"Checking for incompatible \"line\" data type",
+	 "tables_using_line.txt",
+
+	 "SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+
+	 "your installation contains the \"line\" data type in user tables.\n"
+	 "this data type changed its internal and input/output format\n"
+	 "between your old and new versions so this\n"
+	 "cluster cannot currently be upgraded.  you can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "a list of the problem columns is in the file:\n",
+	 old_9_3_check_for_line_data_type_usage},
+
+	/*
+	 *	pg_upgrade only preserves these system values:
+	 *		pg_class.oid
+	 *		pg_type.oid
+	 *		pg_enum.oid
+	 *
+	 *	Many of the reg* data types reference system catalog info that is
+	 *	not preserved, and hence these data types cannot be used in user
+	 *	tables upgraded by pg_upgrade.
+	 */
+	{"Checking for reg* data types in user tables",
+	 "tables_using_reg.txt",
+	 /*
+	  * Note: older servers will not have all of these reg* types, so we have
+	  * to write the query like this rather than depending on casts to regtype.
+	  */
+	 "SELECT oid FROM pg_catalog.pg_type t "
+	 "WHERE t.typnamespace = "
+	 "        (SELECT oid FROM pg_catalog.pg_namespace "
+	 "         WHERE nspname = 'pg_catalog') "
+	 "  AND t.typname IN ( "
+	 /* pg_class.oid is preserved, so 'regclass' is OK */
+	 "           'regcollation', "
+	 "           'regconfig', "
+	 "           'regdictionary', "
+	 "           'regnamespace', "
+	 "           'regoper', "
+	 "           'regoperator', "
+	 "           'regproc', "
+	 "           'regprocedure' "
+	 /* pg_authid.oid is preserved, so 'regrole' is OK */
+	 /* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+	 "         )",
+
+	 "Your installation contains one of the reg* data types in user tables.\n"
+	 "These data types reference system OIDs that are not preserved by\n"
+	 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:\n",
+	 NULL},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	{"Checking for incompatible aclitem data type in user tables",
+	 "tables_using_aclitem.txt",
+
+	 "SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+
+	 "Your installation contains the \"aclitem\" data type in user tables.\n"
+	 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+	 "so this cluster cannot currently be upgraded.  You can drop the\n"
+	 "problem columns and restart the upgrade.  A list of the problem\n"
+	 "columns is in the file:\n",
+	 check_for_aclitem_data_type_usage},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that
+	 * by hand.  However, if there's a table with such a column, the DDL
+	 * reload will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the
+	 * DDL reload will silently change it to "text" which won't match the
+	 * on-disk storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{"Checking for invalid \"unknown\" user columns",
+	 "tables_using_unknown.txt",
+
+	 "SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+
+	 "Your installation contains the \"unknown\" data type in user tables.\n"
+	 "This data type is no longer allowed in tables, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:\n",
+	 old_9_6_check_for_unknown_data_type_usage},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 * In 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{"Checking for invalid \"sql_identifier\" user columns",
+	 "tables_using_sql_identifier.txt",
+
+	 "SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+
+	 "Your installation contains the \"sql_identifier\" data type in user tables.\n"
+	 "The on-disk format for this data type has changed, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:\n",
+	 old_11_check_for_sql_identifier_data_type_usage},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{"Checking for incompatible \"jsonb\" data type",
+	 "tables_using_jsonb.txt",
+
+	 "SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+
+	 "Your installation contains the \"jsonb\" data type in user tables.\n"
+	 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:\n",
+	 check_for_jsonb_9_4_usage},
+
+	 {NULL, NULL, NULL, NULL, NULL}
+};
 
 /*
  * fix_path_separator
@@ -104,16 +266,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
-	 * format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -145,21 +300,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -168,14 +314,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1207,182 +1345,37 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_aclitem_data_type_usage
  *
- *	aclitem changed its storage format in 16, so check for it.
+ *     aclitem changed its storage format in 16, so check for it.
  */
-static void
+static bool
 check_for_aclitem_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible aclitem data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
  * check_for_jsonb_9_4_usage()
  *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
+ *     JSONB changed its storage format during 9.4 beta, so check for it.
  */
-static void
+static bool
 check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 5f2a116f23..18c3217d61 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -320,6 +320,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck)(ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *script_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *fatal_check;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+} DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -442,18 +457,13 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+void		check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..64dec909fa 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -13,12 +13,17 @@
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
  * check_for_data_types_usage()
  *	Detect whether there are any stored columns depending on given type(s)
  *
- * If so, write a report to the given file name, and return true.
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
  *
  * base_query should be a SELECT yielding a single column named "oid",
  * containing the pg_type OIDs of one or more types that are known to have
@@ -27,218 +32,184 @@
  * We check for the type(s) in tables, matviews, and indexes, but not views;
  * there's no storage involved in a view.
  */
-bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	prep_status("Checking for data type usage");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
 	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
+		DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn *conn = connectToServer(cluster, active_db->db_name);
+		DataTypesUsageChecks *cur_check = checks;
 
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		while (cur_check->status)
 		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
+			PGresult *res;
+			int 	ntups;
+			int 	i_nspname;
+			int 	i_relname;
+			int 	i_attname;
+			FILE   *script = NULL;
+			bool 	db_used = false;
+			char	output_path[MAXPGPATH];
+			bool	found = false;
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
 			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
+				cur_check++;
+				continue;
+	 		}
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->script_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested (to
+			 * varying extents depending on server version, but that's not of
+			 * concern here).  To handle all these cases we need a recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+							  "WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+							  "	%s "
+					  "	UNION ALL "
+					  "	SELECT * FROM ( "
+	/* inner WITH because we can only reference the CTE once */
+					  "		WITH x AS (SELECT oid FROM oids) "
+	/* domains on any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+					  "			UNION ALL "
+	/* arrays over any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+					  "			UNION ALL "
+	/* composite types containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+					  "			WHERE t.typtype = 'c' AND "
+					  "				  t.oid = c.reltype AND "
+					  "				  c.oid = a.attrelid AND "
+					  "				  NOT a.attisdropped AND "
+					  "				  a.atttypid = x.oid "
+					  "			UNION ALL "
+	/* ranges containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+					  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+					  "	) foo "
+					  ") "
+	/* now look for stored columns of any such type */
+					  "SELECT n.nspname, c.relname, a.attname "
+					  "FROM	pg_catalog.pg_class c, "
+					  "		pg_catalog.pg_namespace n, "
+					  "		pg_catalog.pg_attribute a "
+					  "WHERE	c.oid = a.attrelid AND "
+					  "		NOT a.attisdropped AND "
+					  "		a.atttypid IN (SELECT oid FROM oids) AND "
+					  "		c.relkind IN ("
+					  CppAsString2(RELKIND_RELATION) ", "
+					  CppAsString2(RELKIND_MATVIEW) ", "
+					  CppAsString2(RELKIND_INDEX) ") AND "
+					  "		c.relnamespace = n.oid AND "
+	/* exclude possible orphaned temp tables */
+					  "		n.nspname !~ '^pg_temp_' AND "
+					  "		n.nspname !~ '^pg_toast_temp_' AND "
+	/* exclude system catalogs, too */
+					  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+					  cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
 
-		PQfinish(conn);
-	}
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, " %s.%s.%s.\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
 
-	if (script)
-		fclose(script);
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
 
-	return found;
-}
+			PQclear(res);
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
-bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
-{
-	bool		found;
-	char	   *base_query;
+			/*
+			 * If the check failed, terminate the umbrella status and print
+			 * the specific status line of the check to indicate which it was
+			 * before terminating with the detailed error message.
+			 */
+			if (found)
+			{
+				PQfinish(conn);
 
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
+				report_status(PG_REPORT, "failed");
+				prep_status("%s", cur_check->status);
+				pg_log(PG_REPORT, "fatal");
+				pg_fatal("%s    %s", cur_check->fatal_check, output_path);
+			}
 
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+			cur_check++;
+		}
 
-	free(base_query);
+		PQfinish(conn);
+	}
 
-	return found;
+	check_ok();
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
+bool
 old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
+bool
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
 /*
@@ -353,41 +324,20 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
+bool
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
 }
 
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Daniel Gustafsson (#1)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Fri, Feb 17, 2023 at 10:44:49PM +0100, Daniel Gustafsson wrote:

In the trivial case, a single database, I don't see a reduction of performance
over the current approach. In a cluster with 100 (empty) databases there is a
~15% reduction in time to run a --check pass. While it won't move the earth in
terms of wallclock time, consuming less resources on the old cluster allowing
--check to be cheaper might be the bigger win.

Nice! This has actually been on my list of things to look into, so I
intend to help review the patch. In any case, +1 for making pg_upgrade
faster.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Daniel Gustafsson (#1)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Fri, Feb 17, 2023 at 10:44:49PM +0100, Daniel Gustafsson wrote:

When adding a check to pg_upgrade a while back I noticed in a profile that the
cluster compatibility check phase spend a lot of time in connectToServer. Some
of this can be attributed to data type checks which each run serially in turn
connecting to each database to run the check, and this seemed like a place
where we can do better.

The attached patch moves the checks from individual functions, which each loops
over all databases, into a struct which is consumed by a single umbrella check
where all data type queries are executed against a database using the same
connection. This way we can amortize the connectToServer overhead across more
accesses to the database.

This change consolidates all the data type checks, so instead of 7 separate
loops through all the databases, there is just one. However, I wonder if
we are leaving too much on the table, as there are a number of other
functions that also loop over all the databases:

* get_loadable_libraries
* get_db_and_rel_infos
* report_extension_updates
* old_9_6_invalidate_hash_indexes
* check_for_isn_and_int8_passing_mismatch
* check_for_user_defined_postfix_ops
* check_for_incompatible_polymorphics
* check_for_tables_with_oids
* check_for_user_defined_encoding_conversions

I suspect consolidating get_loadable_libraries, get_db_and_rel_infos, and
report_extension_updates would be prohibitively complicated and not worth
the effort. old_9_6_invalidate_hash_indexes is only needed for unsupported
versions, so that might not be worth consolidating.
check_for_isn_and_int8_passing_mismatch only loops through all databases
when float8_pass_by_value in the control data differs, so that might not be
worth it, either. The last 4 are for supported versions and, from a very
quick glance, seem possible to consolidate. That would bring us to a total
of 11 separate loops that we could consolidate into one. However, the data
type checks seem to follow a nice pattern, so perhaps this is easier said
than done.

IIUC with the patch, pg_upgrade will immediately fail as soon as a single
check in a database fails. I believe this differs from the current
behavior where all matches for a given check in the cluster are logged
before failing. I wonder if it'd be better to perform all of the data type
checks in all databases before failing so that all of the violations are
reported. Else, users would have to run pg_upgrade, fix a violation, run
pg_upgrade again, fix another one, etc.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Daniel Gustafsson (#1)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Fri, Feb 17, 2023 at 10:44:49PM +0100, Daniel Gustafsson wrote:

When adding a check to pg_upgrade a while back I noticed in a profile that the
cluster compatibility check phase spend a lot of time in connectToServer. Some
of this can be attributed to data type checks which each run serially in turn
connecting to each database to run the check, and this seemed like a place
where we can do better.

src/bin/pg_upgrade/check.c | 371 +++++++++++++++---------------
src/bin/pg_upgrade/pg_upgrade.h | 28 ++-
src/bin/pg_upgrade/version.c | 394 ++++++++++++++------------------
3 files changed, 373 insertions(+), 420 deletions(-)

And saves 50 LOC.

The stated goal of the patch is to reduce overhead. But it only updates
a couple functions, and there are (I think) nine functions which loop
around all DBs. If you want to reduce the overhead, I assumed you'd
cache the DB connection for all tests ... but then I tried it, and first
ran into max_connections, and then ran into EMFILE. Which is probably
enough to kill my idea.

But maybe the existing patch could be phrased in terms of moving all the
per-db checks from functions to data structures (which has its own
merits). Then, there could be a single loop around DBs which executes
all the functions. The test runner can also test the major version and
handle the textfile output.

However (as Nathan mentioned) what's currently done shows *all* the
problems of a given type - if there were 9 DBs with 99 relations with
OIDs, it'd show all of them at once. It'd be a big step backwards to
only show problems for the first problematic DB.

But maybe that's an another opportunity to do better. Right now, if I
run pg_upgrade, it'll show all the failing objects, but only for first
check that fails. After fixing them, it might tell me about a 2nd
failing check. I've never run into multiple types of failing checks,
but I do know that needing to re-run pg-upgrade is annoying (see
3c0471b5f).

You talked about improving the two data types tests, which aren't
conditional on a maximum server version. The minimal improvement you'll
get is when only those two checks are run (like on a developer upgrade
v16=>v16). But when more checks are run during a production upgrade
like v13=>v16, you'd see a larger gain.

I fooled around with that idea in the attached patch. I have no
particular interest in optimizing --check for large numbers of DBs, so
I'm not planning to pursue it further, but maybe it'll be useful to you.

About your original patch:

+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{"Checking for system-defined composite types in user tables",
+	 "tables_using_composite.txt",

I think this might e cleaner using "named initializer" struct
initialization, rather than a comma-separated list (whatever that's
called).

Maybe instead of putting all checks into an array of
DataTypesUsageChecks, they should be defined in separate arrays, and
then an array defined with the list of checks?

+			 * If the check failed, terminate the umbrella status and print
+			 * the specific status line of the check to indicate which it was
+			 * before terminating with the detailed error message.
+			 */
+			if (found)
+			{
+				PQfinish(conn);

-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
+				report_status(PG_REPORT, "failed");
+				prep_status("%s", cur_check->status);
+				pg_log(PG_REPORT, "fatal");
+				pg_fatal("%s    %s", cur_check->fatal_check, output_path);
+			}

I think this loses the message localization/translation that currently
exists. It could be written like prep_status(cur_check->status) or
prep_status("%s", _(cur_check->status)). And _(cur_check->fatal_check).

--
Justin

Attachments:

0001-wip-pg_upgrade-data-structure.patchtext/x-diff; charset=us-asciiDownload

From 18f406c16e5ebeaaf4a24c5b5a57a8358a91afb4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 17 Feb 2023 19:51:42 -0600
Subject: [PATCH] wip: pg_upgrade data structure

---
 src/bin/pg_upgrade/check.c      | 929 ++++++++++++++------------------
 src/bin/pg_upgrade/pg_upgrade.h |  10 +-
 src/bin/pg_upgrade/version.c    | 256 ++++-----
 3 files changed, 517 insertions(+), 678 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 189aa51c4f8..5a5f69e789b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -23,17 +23,17 @@ static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_proper_datallowconn(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
-static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
-static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
-static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
-static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+static bool check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_user_defined_postfix_ops(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_incompatible_polymorphics(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_tables_with_oids(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_composite_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_reg_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_aclitem_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+static bool check_for_jsonb_9_4_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
-static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static bool check_for_user_defined_encoding_conversions(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
 static char *get_canonical_locale_name(int category, const char *locale);
 
 
@@ -88,6 +88,127 @@ check_and_dump_old_cluster(bool live_check)
 {
 	/* -- OLD -- */
 
+	struct checks {
+		int maxmajor; /* Last major version that needs this test */
+		bool (*fn)(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+		char *scriptname;
+		char *fataltext;
+		/* filled in later: */
+		FILE *scriptfp;
+		bool failedtests;
+	} checks[] = {
+		{0, check_for_composite_data_type_usage,
+			 "tables_using_composite.txt",
+			 .fataltext = "Your installation contains system-defined composite type(s) in user tables.\n"
+				 "These type OIDs are not stable across PostgreSQL versions,\n"
+				 "so this cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{0, check_for_reg_data_type_usage,
+			 "tables_using_reg.txt",
+			 .fataltext = "Your installation contains one of the reg* data types in user tables.\n"
+				 "These data types reference system OIDs that are not preserved by\n"
+				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{1500, check_for_aclitem_data_type_usage,
+			"tables_using_aclitem.txt",
+			.fataltext = "Your installation contains the \"aclitem\" data type in user tables.\n"
+				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+				 "so this cluster cannot currently be upgraded.  You can drop the\n"
+				 "problem columns and restart the upgrade.  A list of the problem\n"
+				 "columns is in the file:\n",
+		},
+
+		{1300, check_for_user_defined_encoding_conversions,
+			 "encoding_conversions.txt",
+			 .fataltext = "Your installation contains user-defined encoding conversions.\n"
+				 "The conversion function parameters changed in PostgreSQL version 14\n"
+				 "so this cluster cannot currently be upgraded.  You can remove the\n"
+				 "encoding conversions in the old cluster and restart the upgrade.\n"
+				 "A list of user-defined encoding conversions is in the file:\n"
+		},
+
+		{1300, check_for_user_defined_postfix_ops,
+			"postfix_ops.txt",
+			.fataltext = "Your installation contains user-defined postfix operators, which are not\n"
+				"supported anymore.  Consider dropping the postfix operators and replacing\n"
+				"them with prefix operators or function calls.\n"
+				"A list of user-defined postfix operators is in the file:\n",
+		},
+
+		{1300, check_for_incompatible_polymorphics,
+			 "incompatible_polymorphics.txt",
+			 .fataltext = "Your installation contains user-defined objects that refer to internal\n"
+				 "polymorphic functions with arguments of type \"anyarray\" or \"anyelement\".\n"
+				 "These user-defined objects must be dropped before upgrading and restored\n"
+				 "afterwards, changing them to refer to the new corresponding functions with\n"
+				 "arguments of type \"anycompatiblearray\" and \"anycompatible\".\n"
+				 "A list of the problematic objects is in the file:\n",
+		},
+
+		{1100, check_for_tables_with_oids,
+			 "tables_with_oids.txt",
+			 .fataltext = "Your installation contains tables declared WITH OIDS, which is not\n"
+				 "supported anymore.  Consider removing the oid column using\n"
+				 "    ALTER TABLE ... SET WITHOUT OIDS;\n"
+				 "A list of tables with the problem is in the file:\n",
+		},
+
+		{1100, old_11_check_for_sql_identifier_data_type_usage,
+			 "tables_using_sql_identifier.txt",
+			 .fataltext = "Your installation contains the \"sql_identifier\" data type in user tables.\n"
+				 "The on-disk format for this data type has changed, so this\n"
+				 "cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{906, old_9_6_check_for_unknown_data_type_usage,
+			 "tables_using_unknown.txt",
+			 .fataltext = "Your installation contains the \"unknown\" data type in user tables.\n"
+				 "This data type is no longer allowed in tables, so this\n"
+				 "cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{904, check_for_jsonb_9_4_usage,
+			 "tables_using_jsonb.txt",
+			 .fataltext = "Your installation contains the \"jsonb\" data type in user tables.\n"
+				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+				 "cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{903, old_9_3_check_for_line_data_type_usage,
+			 "tables_using_line.txt",
+			 .fataltext = "Your installation contains the \"line\" data type in user tables.\n"
+				 "This data type changed its internal and input/output format\n"
+				 "between your old and new versions so this\n"
+				 "cluster cannot currently be upgraded.  You can\n"
+				 "drop the problem columns and restart the upgrade.\n"
+				 "A list of the problem columns is in the file:\n",
+		},
+
+		{0000, check_for_isn_and_int8_passing_mismatch,
+			"contrib_isn_and_int8_pass_by_value.txt",
+			.fataltext = "Your installation contains \"contrib/isn\" functions which rely on the\n"
+				"bigint data type.  Your old and new clusters pass bigint values\n"
+				"differently so this cluster cannot currently be upgraded.  You can\n"
+				"manually dump databases in the old cluster that use \"contrib/isn\"\n"
+				"facilities, drop them, perform the upgrade, and then restore them.  A\n"
+				"list of the problem functions is in the file:\n"
+		},
+
+		{ .fn = NULL, } /* sentinel */
+	};
+
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
@@ -98,61 +219,76 @@ check_and_dump_old_cluster(bool live_check)
 
 	get_loadable_libraries();
 
-
 	/*
 	 * Check for various failure cases
 	 */
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
-	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
-	 * format for existing data.
+	 * For each DB, run all checks.  This amortizes the cost of opening new
+	 * DB connections in case there are many DBs.
 	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
 
-	/*
-	 * PG 14 changed the function signature of encoding conversion functions.
-	 * Conversions from older versions cannot be upgraded automatically
-	 * because the user-defined functions used by the encoding conversions
-	 * need to be changed to match the new signature.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
-		check_for_user_defined_encoding_conversions(&old_cluster);
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo		*active_db = &old_cluster.dbarr.dbs[dbnum];
+		PGconn	    *conn = connectToServer(&old_cluster, active_db->db_name);
 
-	/*
-	 * Pre-PG 14 allowed user defined postfix operators, which are not
-	 * supported anymore.  Verify there are none, iff applicable.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
-		check_for_user_defined_postfix_ops(&old_cluster);
+		for (struct checks *check = checks; check->fn != NULL; check++)
+		{
+			bool		ret;
 
-	/*
-	 * PG 14 changed polymorphic functions from anyarray to
-	 * anycompatiblearray.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
-		check_for_incompatible_polymorphics(&old_cluster);
+			if (GET_MAJOR_VERSION(old_cluster.major_version) > check->maxmajor &&
+					check->maxmajor != 0)
+				continue;
 
-	/*
-	 * Pre-PG 12 allowed tables to be declared WITH OIDS, which is not
-	 * supported anymore. Verify there are none, iff applicable.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		check_for_tables_with_oids(&old_cluster);
+			if (check->scriptfp == NULL)
+			{
+				char		output_path[MAXPGPATH];
+				snprintf(output_path, sizeof(output_path), "%s/%s",
+						log_opts.basedir, check->scriptname);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
+				if ((check->scriptfp = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %m", output_path);
+			}
+
+			ret = check->fn(&old_cluster, active_db, conn, check->scriptfp);
+			if (!ret)
+			{
+				if (!check->failedtests)
+					pg_log(PG_WARNING, "%s  %s", _(check->fataltext),
+							check->scriptname);
+				check->failedtests = true;
+			}
+		}
+
+		PQfinish(conn);
+	}
+
+	/* Remove empty script files for successful tests */
+	for (struct checks *check = checks; check->fn != NULL; check++)
+	{
+		char		output_path[MAXPGPATH];
+
+		if (check->failedtests)
+			continue;
+
+		/* This ought to be empty */
+		snprintf(output_path, sizeof(output_path), "%s/%s",
+				log_opts.basedir, check->scriptname);
+		unlink(output_path);
+	}
+
+	for (struct checks *check = checks; check->fn != NULL; check++)
+	{
+		if (check->failedtests)
+		{
+			pg_log(PG_REPORT, "fatal");
+			exit(1);
+		}
+	}
 
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
@@ -160,7 +296,6 @@ check_and_dump_old_cluster(bool live_check)
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -169,14 +304,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -833,179 +960,116 @@ check_for_prepared_transactions(ClusterInfo *cluster)
  *	by value.  The schema dumps the CREATE TYPE PASSEDBYVALUE setting so
  *	it must match for the old and new servers.
  */
-static void
-check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster)
+static bool
+check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	int			dbnum;
-	FILE	   *script = NULL;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for contrib/isn with bigint-passing mismatch");
+	PGresult   *res;
+	bool		db_used = false;
+	int			ntups;
+	int			rowno;
+	int			i_nspname,
+				i_proname;
 
 	if (old_cluster.controldata.float8_pass_by_value ==
 		new_cluster.controldata.float8_pass_by_value)
 	{
 		/* no mismatch */
-		check_ok();
-		return;
+		return true;
 	}
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "contrib_isn_and_int8_pass_by_value.txt");
+	/* Find any functions coming from contrib/isn */
+	res = executeQueryOrDie(conn,
+							"SELECT n.nspname, p.proname "
+							"FROM	pg_catalog.pg_proc p, "
+							"		pg_catalog.pg_namespace n "
+							"WHERE	p.pronamespace = n.oid AND "
+							"		p.probin = '$libdir/isn'");
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	ntups = PQntuples(res);
+	i_nspname = PQfnumber(res, "nspname");
+	i_proname = PQfnumber(res, "proname");
+	for (rowno = 0; rowno < ntups; rowno++)
 	{
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_proname;
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-
-		/* Find any functions coming from contrib/isn */
-		res = executeQueryOrDie(conn,
-								"SELECT n.nspname, p.proname "
-								"FROM	pg_catalog.pg_proc p, "
-								"		pg_catalog.pg_namespace n "
-								"WHERE	p.pronamespace = n.oid AND "
-								"		p.probin = '$libdir/isn'");
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_proname = PQfnumber(res, "proname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		if (!db_used)
 		{
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s",
-						 output_path, strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_proname));
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
 		}
 
-		PQclear(res);
-
-		PQfinish(conn);
+		fprintf(script, "  %s.%s\n",
+				PQgetvalue(res, rowno, i_nspname),
+				PQgetvalue(res, rowno, i_proname));
 	}
 
-	if (script)
-	{
-		fclose(script);
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains \"contrib/isn\" functions which rely on the\n"
-				 "bigint data type.  Your old and new clusters pass bigint values\n"
-				 "differently so this cluster cannot currently be upgraded.  You can\n"
-				 "manually dump databases in the old cluster that use \"contrib/isn\"\n"
-				 "facilities, drop them, perform the upgrade, and then restore them.  A\n"
-				 "list of the problem functions is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	PQclear(res);
+	return ntups == 0;
 }
 
 /*
  * Verify that no user defined postfix operators exist.
- */
-static void
-check_for_user_defined_postfix_ops(ClusterInfo *cluster)
-{
-	int			dbnum;
-	FILE	   *script = NULL;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for user-defined postfix operators");
+ *
+ * Pre-PG 14 allowed user defined postfix operators, which are not
+ * supported anymore.
+*/
+static bool
+check_for_user_defined_postfix_ops(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "postfix_ops.txt");
+{
+	PGresult   *res;
+	bool		db_used = false;
+	int			ntups;
+	int			rowno;
+	int			i_oproid,
+				i_oprnsp,
+				i_oprname,
+				i_typnsp,
+				i_typname;
 
-	/* Find any user defined postfix operators */
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	/*
+	 * The query below hardcodes FirstNormalObjectId as 16384 rather than
+	 * interpolating that C #define into the query because, if that
+	 * #define is ever changed, the cutoff we want to use is the value
+	 * used by pre-version 14 servers, not that of some future version.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT o.oid AS oproid, "
+							"       n.nspname AS oprnsp, "
+							"       o.oprname, "
+							"       tn.nspname AS typnsp, "
+							"       t.typname "
+							"FROM pg_catalog.pg_operator o, "
+							"     pg_catalog.pg_namespace n, "
+							"     pg_catalog.pg_type t, "
+							"     pg_catalog.pg_namespace tn "
+							"WHERE o.oprnamespace = n.oid AND "
+							"      o.oprleft = t.oid AND "
+							"      t.typnamespace = tn.oid AND "
+							"      o.oprright = 0 AND "
+							"      o.oid >= 16384");
+	ntups = PQntuples(res);
+	i_oproid = PQfnumber(res, "oproid");
+	i_oprnsp = PQfnumber(res, "oprnsp");
+	i_oprname = PQfnumber(res, "oprname");
+	i_typnsp = PQfnumber(res, "typnsp");
+	i_typname = PQfnumber(res, "typname");
+	for (rowno = 0; rowno < ntups; rowno++)
 	{
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_oproid,
-					i_oprnsp,
-					i_oprname,
-					i_typnsp,
-					i_typname;
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-
-		/*
-		 * The query below hardcodes FirstNormalObjectId as 16384 rather than
-		 * interpolating that C #define into the query because, if that
-		 * #define is ever changed, the cutoff we want to use is the value
-		 * used by pre-version 14 servers, not that of some future version.
-		 */
-		res = executeQueryOrDie(conn,
-								"SELECT o.oid AS oproid, "
-								"       n.nspname AS oprnsp, "
-								"       o.oprname, "
-								"       tn.nspname AS typnsp, "
-								"       t.typname "
-								"FROM pg_catalog.pg_operator o, "
-								"     pg_catalog.pg_namespace n, "
-								"     pg_catalog.pg_type t, "
-								"     pg_catalog.pg_namespace tn "
-								"WHERE o.oprnamespace = n.oid AND "
-								"      o.oprleft = t.oid AND "
-								"      t.typnamespace = tn.oid AND "
-								"      o.oprright = 0 AND "
-								"      o.oid >= 16384");
-		ntups = PQntuples(res);
-		i_oproid = PQfnumber(res, "oproid");
-		i_oprnsp = PQfnumber(res, "oprnsp");
-		i_oprname = PQfnumber(res, "oprname");
-		i_typnsp = PQfnumber(res, "typnsp");
-		i_typname = PQfnumber(res, "typname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		if (!db_used)
 		{
-			if (script == NULL &&
-				(script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s",
-						 output_path, strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  (oid=%s) %s.%s (%s.%s, NONE)\n",
-					PQgetvalue(res, rowno, i_oproid),
-					PQgetvalue(res, rowno, i_oprnsp),
-					PQgetvalue(res, rowno, i_oprname),
-					PQgetvalue(res, rowno, i_typnsp),
-					PQgetvalue(res, rowno, i_typname));
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
 		}
-
-		PQclear(res);
-
-		PQfinish(conn);
+		fprintf(script, "  (oid=%s) %s.%s (%s.%s, NONE)\n",
+				PQgetvalue(res, rowno, i_oproid),
+				PQgetvalue(res, rowno, i_oprnsp),
+				PQgetvalue(res, rowno, i_oprname),
+				PQgetvalue(res, rowno, i_typnsp),
+				PQgetvalue(res, rowno, i_typname));
 	}
 
-	if (script)
-	{
-		fclose(script);
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains user-defined postfix operators, which are not\n"
-				 "supported anymore.  Consider dropping the postfix operators and replacing\n"
-				 "them with prefix operators or function calls.\n"
-				 "A list of user-defined postfix operators is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	PQclear(res);
+
+	return ntups == 0;
 }
 
 /*
@@ -1014,19 +1078,16 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster)
  *	Make sure nothing is using old polymorphic functions with
  *	anyarray/anyelement rather than the new anycompatible variants.
  */
-static void
-check_for_incompatible_polymorphics(ClusterInfo *cluster)
+static bool
+check_for_incompatible_polymorphics(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
 	PGresult   *res;
-	FILE	   *script = NULL;
-	char		output_path[MAXPGPATH];
 	PQExpBufferData old_polymorphics;
 
-	prep_status("Checking for incompatible polymorphic functions");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "incompatible_polymorphics.txt");
+	bool		db_used = false;
+	int			ntups;
+	int			i_objkind,
+				i_objname;
 
 	/* The set of problematic functions varies a bit in different versions */
 	initPQExpBuffer(&old_polymorphics);
@@ -1048,167 +1109,109 @@ check_for_incompatible_polymorphics(ClusterInfo *cluster)
 							 ", 'array_positions(anyarray,anyelement)'"
 							 ", 'width_bucket(anyelement,anyarray)'");
 
-	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		bool		db_used = false;
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		int			ntups;
-		int			i_objkind,
-					i_objname;
-
-		/*
-		 * The query below hardcodes FirstNormalObjectId as 16384 rather than
-		 * interpolating that C #define into the query because, if that
-		 * #define is ever changed, the cutoff we want to use is the value
-		 * used by pre-version 14 servers, not that of some future version.
-		 */
-		res = executeQueryOrDie(conn,
-		/* Aggregate transition functions */
-								"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
-								"FROM pg_proc AS p "
-								"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
-								"JOIN pg_proc AS transfn ON transfn.oid=a.aggtransfn "
-								"WHERE p.oid >= 16384 "
-								"AND a.aggtransfn = ANY(ARRAY[%s]::regprocedure[]) "
-								"AND a.aggtranstype = ANY(ARRAY['anyarray', 'anyelement']::regtype[]) "
-
-		/* Aggregate final functions */
-								"UNION ALL "
-								"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
-								"FROM pg_proc AS p "
-								"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
-								"JOIN pg_proc AS finalfn ON finalfn.oid=a.aggfinalfn "
-								"WHERE p.oid >= 16384 "
-								"AND a.aggfinalfn = ANY(ARRAY[%s]::regprocedure[]) "
-								"AND a.aggtranstype = ANY(ARRAY['anyarray', 'anyelement']::regtype[]) "
-
-		/* Operators */
-								"UNION ALL "
-								"SELECT 'operator' AS objkind, op.oid::regoperator::text AS objname "
-								"FROM pg_operator AS op "
-								"WHERE op.oid >= 16384 "
-								"AND oprcode = ANY(ARRAY[%s]::regprocedure[]) "
-								"AND oprleft = ANY(ARRAY['anyarray', 'anyelement']::regtype[]);",
-								old_polymorphics.data,
-								old_polymorphics.data,
-								old_polymorphics.data);
-
-		ntups = PQntuples(res);
-
-		i_objkind = PQfnumber(res, "objkind");
-		i_objname = PQfnumber(res, "objname");
-
-		for (int rowno = 0; rowno < ntups; rowno++)
-		{
-			if (script == NULL &&
-				(script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s",
-						 output_path, strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
+	/*
+	 * The query below hardcodes FirstNormalObjectId as 16384 rather than
+	 * interpolating that C #define into the query because, if that
+	 * #define is ever changed, the cutoff we want to use is the value
+	 * used by pre-version 14 servers, not that of some future version.
+	 */
+	res = executeQueryOrDie(conn,
+	/* Aggregate transition functions */
+							"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
+							"FROM pg_proc AS p "
+							"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
+							"JOIN pg_proc AS transfn ON transfn.oid=a.aggtransfn "
+							"WHERE p.oid >= 16384 "
+							"AND a.aggtransfn = ANY(ARRAY[%s]::regprocedure[]) "
+							"AND a.aggtranstype = ANY(ARRAY['anyarray', 'anyelement']::regtype[]) "
+
+	/* Aggregate final functions */
+							"UNION ALL "
+							"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
+							"FROM pg_proc AS p "
+							"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
+							"JOIN pg_proc AS finalfn ON finalfn.oid=a.aggfinalfn "
+							"WHERE p.oid >= 16384 "
+							"AND a.aggfinalfn = ANY(ARRAY[%s]::regprocedure[]) "
+							"AND a.aggtranstype = ANY(ARRAY['anyarray', 'anyelement']::regtype[]) "
+
+	/* Operators */
+							"UNION ALL "
+							"SELECT 'operator' AS objkind, op.oid::regoperator::text AS objname "
+							"FROM pg_operator AS op "
+							"WHERE op.oid >= 16384 "
+							"AND oprcode = ANY(ARRAY[%s]::regprocedure[]) "
+							"AND oprleft = ANY(ARRAY['anyarray', 'anyelement']::regtype[]);",
+							old_polymorphics.data,
+							old_polymorphics.data,
+							old_polymorphics.data);
 
-			fprintf(script, "  %s: %s\n",
-					PQgetvalue(res, rowno, i_objkind),
-					PQgetvalue(res, rowno, i_objname));
-		}
+	ntups = PQntuples(res);
 
-		PQclear(res);
-		PQfinish(conn);
-	}
+	i_objkind = PQfnumber(res, "objkind");
+	i_objname = PQfnumber(res, "objname");
 
-	if (script)
+	for (int rowno = 0; rowno < ntups; rowno++)
 	{
-		fclose(script);
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains user-defined objects that refer to internal\n"
-				 "polymorphic functions with arguments of type \"anyarray\" or \"anyelement\".\n"
-				 "These user-defined objects must be dropped before upgrading and restored\n"
-				 "afterwards, changing them to refer to the new corresponding functions with\n"
-				 "arguments of type \"anycompatiblearray\" and \"anycompatible\".\n"
-				 "A list of the problematic objects is in the file:\n"
-				 "    %s", output_path);
+		if (!db_used)
+		{
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
+		}
+
+		fprintf(script, "  %s: %s\n",
+				PQgetvalue(res, rowno, i_objkind),
+				PQgetvalue(res, rowno, i_objname));
 	}
-	else
-		check_ok();
 
+	PQclear(res);
 	termPQExpBuffer(&old_polymorphics);
+
+	return ntups == 0;
 }
 
 /*
  * Verify that no tables are declared WITH OIDS.
+ *
+ * Pre-PG 12 allowed tables to be declared WITH OIDS, which is not
+ * supported anymore.
  */
-static void
-check_for_tables_with_oids(ClusterInfo *cluster)
+static bool
+check_for_tables_with_oids(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	int			dbnum;
-	FILE	   *script = NULL;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for tables WITH OIDS");
+	PGresult   *res;
+	bool		db_used = false;
+	int			ntups;
+	int			rowno;
+	int			i_nspname,
+				i_relname;
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_with_oids.txt");
+	res = executeQueryOrDie(conn,
+							"SELECT n.nspname, c.relname "
+							"FROM	pg_catalog.pg_class c, "
+							"		pg_catalog.pg_namespace n "
+							"WHERE	c.relnamespace = n.oid AND "
+							"		c.relhasoids AND"
+							"       n.nspname NOT IN ('pg_catalog')");
 
-	/* Find any tables declared WITH OIDS */
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	ntups = PQntuples(res);
+	i_nspname = PQfnumber(res, "nspname");
+	i_relname = PQfnumber(res, "relname");
+	for (rowno = 0; rowno < ntups; rowno++)
 	{
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname;
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-
-		res = executeQueryOrDie(conn,
-								"SELECT n.nspname, c.relname "
-								"FROM	pg_catalog.pg_class c, "
-								"		pg_catalog.pg_namespace n "
-								"WHERE	c.relnamespace = n.oid AND "
-								"		c.relhasoids AND"
-								"       n.nspname NOT IN ('pg_catalog')");
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		if (!db_used)
 		{
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s",
-						 output_path, strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname));
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
 		}
-
-		PQclear(res);
-
-		PQfinish(conn);
+		fprintf(script, "  %s.%s\n",
+				PQgetvalue(res, rowno, i_nspname),
+				PQgetvalue(res, rowno, i_relname));
 	}
 
-	if (script)
-	{
-		fclose(script);
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains tables declared WITH OIDS, which is not\n"
-				 "supported anymore.  Consider removing the oid column using\n"
-				 "    ALTER TABLE ... SET WITHOUT OIDS;\n"
-				 "A list of tables with the problem is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	PQclear(res);
+
+	return ntups == 0;
 }
 
 
@@ -1221,20 +1224,13 @@ check_for_tables_with_oids(ClusterInfo *cluster)
  *	no mechanism for forcing them to be the same in the new cluster.
  *	Hence, if any user table uses one, that's problematic for pg_upgrade.
  */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
+static bool
+check_for_composite_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
 	bool		found;
 	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
 	char	   *base_query;
 
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
 	/*
 	 * Look for composite types that were made during initdb *or* belong to
 	 * information_schema; that's important in case information_schema was
@@ -1253,22 +1249,11 @@ check_for_composite_data_type_usage(ClusterInfo *cluster)
 						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
 						  firstUserOid);
 
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	found = check_for_data_types_usage(cluster, base_query, active_db, conn, script);
 
 	free(base_query);
 
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return !found;
 }
 
 /*
@@ -1282,17 +1267,10 @@ check_for_composite_data_type_usage(ClusterInfo *cluster)
  *	not preserved, and hence these data types cannot be used in user
  *	tables upgraded by pg_upgrade.
  */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
+static bool
+check_for_reg_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
 	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
 
 	/*
 	 * Note: older servers will not have all of these reg* types, so we have
@@ -1316,48 +1294,22 @@ check_for_reg_data_type_usage(ClusterInfo *cluster)
 	/* pg_authid.oid is preserved, so 'regrole' is OK */
 	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
 									   "         )",
-									   output_path);
+									   active_db, conn, script);
 
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return !found;
 }
 
 /*
  * check_for_aclitem_data_type_usage
  *
- *	aclitem changed its storage format in 16, so check for it.
+ * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+ * format for existing data.
  */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
+static bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	char		output_path[MAXPGPATH];
-
 	prep_status("Checking for incompatible aclitem data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return check_for_data_type_usage(cluster, "pg_catalog.aclitem", active_db, conn, script);
 }
 
 /*
@@ -1365,29 +1317,13 @@ check_for_aclitem_data_type_usage(ClusterInfo *cluster)
  *
  *	JSONB changed its storage format during 9.4 beta, so check for it.
  */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
+static bool
+check_for_jsonb_9_4_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
+	if (old_cluster.controldata.cat_ver >= JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return check_for_data_type_usage(cluster, "pg_catalog.jsonb", active_db, conn, script);
 }
 
 /*
@@ -1450,84 +1386,55 @@ check_for_pg_role_prefix(ClusterInfo *cluster)
 
 /*
  * Verify that no user-defined encoding conversions exist.
+ *
+ * PG 14 changed the function signature of encoding conversion functions.
+ * Conversions from older versions cannot be upgraded automatically
+ * because the user-defined functions used by the encoding conversions
+ * need to be changed to match the new signature.
  */
-static void
-check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
+static bool
+check_for_user_defined_encoding_conversions(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	int			dbnum;
-	FILE	   *script = NULL;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for user-defined encoding conversions");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "encoding_conversions.txt");
+	PGresult   *res;
+	bool		db_used = false;
+	int			ntups;
+	int			rowno;
+	int			i_conoid,
+				i_conname,
+				i_nspname;
 
-	/* Find any user defined encoding conversions */
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	/*
+	 * The query below hardcodes FirstNormalObjectId as 16384 rather than
+	 * interpolating that C #define into the query because, if that
+	 * #define is ever changed, the cutoff we want to use is the value
+	 * used by pre-version 14 servers, not that of some future version.
+	 */
+	res = executeQueryOrDie(conn,
+							"SELECT c.oid as conoid, c.conname, n.nspname "
+							"FROM pg_catalog.pg_conversion c, "
+							"     pg_catalog.pg_namespace n "
+							"WHERE c.connamespace = n.oid AND "
+							"      c.oid >= 16384");
+	ntups = PQntuples(res);
+	i_conoid = PQfnumber(res, "conoid");
+	i_conname = PQfnumber(res, "conname");
+	i_nspname = PQfnumber(res, "nspname");
+	for (rowno = 0; rowno < ntups; rowno++)
 	{
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_conoid,
-					i_conname,
-					i_nspname;
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-
-		/*
-		 * The query below hardcodes FirstNormalObjectId as 16384 rather than
-		 * interpolating that C #define into the query because, if that
-		 * #define is ever changed, the cutoff we want to use is the value
-		 * used by pre-version 14 servers, not that of some future version.
-		 */
-		res = executeQueryOrDie(conn,
-								"SELECT c.oid as conoid, c.conname, n.nspname "
-								"FROM pg_catalog.pg_conversion c, "
-								"     pg_catalog.pg_namespace n "
-								"WHERE c.connamespace = n.oid AND "
-								"      c.oid >= 16384");
-		ntups = PQntuples(res);
-		i_conoid = PQfnumber(res, "conoid");
-		i_conname = PQfnumber(res, "conname");
-		i_nspname = PQfnumber(res, "nspname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		if (!db_used)
 		{
-			if (script == NULL &&
-				(script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s",
-						 output_path, strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  (oid=%s) %s.%s\n",
-					PQgetvalue(res, rowno, i_conoid),
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_conname));
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
 		}
-
-		PQclear(res);
-
-		PQfinish(conn);
+		fprintf(script, "  (oid=%s) %s.%s\n",
+				PQgetvalue(res, rowno, i_conoid),
+				PQgetvalue(res, rowno, i_nspname),
+				PQgetvalue(res, rowno, i_conname));
 	}
 
-	if (script)
-	{
-		fclose(script);
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains user-defined encoding conversions.\n"
-				 "The conversion function parameters changed in PostgreSQL version 14\n"
-				 "so this cluster cannot currently be upgraded.  You can remove the\n"
-				 "encoding conversions in the old cluster and restart the upgrade.\n"
-				 "A list of user-defined encoding conversions is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	PQclear(res);
+
+	return ntups == 0;
 }
 
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 5f2a116f23e..39e8fd2c8d5 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -444,16 +444,16 @@ unsigned int str2uint(const char *str);
 
 bool		check_for_data_types_usage(ClusterInfo *cluster,
 									   const char *base_query,
-									   const char *output_path);
+									   DbInfo *active_db, PGconn *conn, FILE *script);
 bool		check_for_data_type_usage(ClusterInfo *cluster,
 									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+									  DbInfo *active_db, PGconn *conn, FILE *script);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfaa..f2b885347a8 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -30,111 +30,96 @@
 bool
 check_for_data_types_usage(ClusterInfo *cluster,
 						   const char *base_query,
-						   const char *output_path)
+						   DbInfo *active_db, PGconn *conn, FILE *script)
 {
 	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	PQExpBufferData querybuf;
+	PGresult   *res;
+	bool		db_used = false;
+	int			ntups;
+	int			rowno;
+	int			i_nspname,
+				i_relname,
+				i_attname;
+
+	/*
+	 * The type(s) of interest might be wrapped in a domain, array,
+	 * composite, or range, and these container types can be nested (to
+	 * varying extents depending on server version, but that's not of
+	 * concern here).  To handle all these cases we need a recursive CTE.
+	 */
+	initPQExpBuffer(&querybuf);
+	appendPQExpBuffer(&querybuf,
+					  "WITH RECURSIVE oids AS ( "
+	/* start with the type(s) returned by base_query */
+					  "	%s "
+					  "	UNION ALL "
+					  "	SELECT * FROM ( "
+	/* inner WITH because we can only reference the CTE once */
+					  "		WITH x AS (SELECT oid FROM oids) "
+	/* domains on any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+					  "			UNION ALL "
+	/* arrays over any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+					  "			UNION ALL "
+	/* composite types containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+					  "			WHERE t.typtype = 'c' AND "
+					  "				  t.oid = c.reltype AND "
+					  "				  c.oid = a.attrelid AND "
+					  "				  NOT a.attisdropped AND "
+					  "				  a.atttypid = x.oid "
+					  "			UNION ALL "
+	/* ranges containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+					  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+					  "	) foo "
+					  ") "
+	/* now look for stored columns of any such type */
+					  "SELECT n.nspname, c.relname, a.attname "
+					  "FROM	pg_catalog.pg_class c, "
+					  "		pg_catalog.pg_namespace n, "
+					  "		pg_catalog.pg_attribute a "
+					  "WHERE	c.oid = a.attrelid AND "
+					  "		NOT a.attisdropped AND "
+					  "		a.atttypid IN (SELECT oid FROM oids) AND "
+					  "		c.relkind IN ("
+					  CppAsString2(RELKIND_RELATION) ", "
+					  CppAsString2(RELKIND_MATVIEW) ", "
+					  CppAsString2(RELKIND_INDEX) ") AND "
+					  "		c.relnamespace = n.oid AND "
+	/* exclude possible orphaned temp tables */
+					  "		n.nspname !~ '^pg_temp_' AND "
+					  "		n.nspname !~ '^pg_toast_temp_' AND "
+	/* exclude system catalogs, too */
+					  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+					  base_query);
+
+	res = executeQueryOrDie(conn, "%s", querybuf.data);
+
+	ntups = PQntuples(res);
+	i_nspname = PQfnumber(res, "nspname");
+	i_relname = PQfnumber(res, "relname");
+	i_attname = PQfnumber(res, "attname");
+	for (rowno = 0; rowno < ntups; rowno++)
 	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
+		found = true;
+		if (!db_used)
 		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
+			fprintf(script, "In database: %s\n", active_db->db_name);
+			db_used = true;
 		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
+		fprintf(script, "  %s.%s.%s\n",
+				PQgetvalue(res, rowno, i_nspname),
+				PQgetvalue(res, rowno, i_relname),
+				PQgetvalue(res, rowno, i_attname));
 	}
 
-	if (script)
-		fclose(script);
+	PQclear(res);
+
+	termPQExpBuffer(&querybuf);
 
 	return found;
 }
@@ -152,7 +137,7 @@ check_for_data_types_usage(ClusterInfo *cluster,
 bool
 check_for_data_type_usage(ClusterInfo *cluster,
 						  const char *type_name,
-						  const char *output_path)
+						  DbInfo *active_db, PGconn *conn, FILE *script)
 {
 	bool		found;
 	char	   *base_query;
@@ -160,7 +145,7 @@ check_for_data_type_usage(ClusterInfo *cluster,
 	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
 						  type_name);
 
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	found = check_for_data_types_usage(cluster, base_query, active_db, conn, script);
 
 	free(base_query);
 
@@ -176,30 +161,12 @@ check_for_data_type_usage(ClusterInfo *cluster,
  *	compile-time switch; as of 9.4 "line" has a different on-disk
  *	representation format.
  */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	char		output_path[MAXPGPATH];
-
 	prep_status("Checking for incompatible \"line\" data type");
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return check_for_data_type_usage(cluster, "pg_catalog.line", active_db, conn, script);
 }
 
 
@@ -216,29 +183,12 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
  *	DDL reload will silently change it to "text" which won't match the
  *	on-disk storage (which is like "cstring").  So we *must* reject that.
  */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	char		output_path[MAXPGPATH];
-
 	prep_status("Checking for invalid \"unknown\" user columns");
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return check_for_data_type_usage(cluster, "pg_catalog.unknown", active_db, conn, script);
 }
 
 /*
@@ -358,33 +308,15 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
  *	11 -> 12
  *	In 12, the sql_identifier data type was switched from name to varchar,
  *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
+ *	means user objects using sql_identifier for columns are broken because
  *	the on-disk format is different.
  */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
+bool
+old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster, DbInfo *active_db, PGconn *conn, FILE *script)
 {
-	char		output_path[MAXPGPATH];
-
 	prep_status("Checking for invalid \"sql_identifier\" user columns");
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return check_for_data_type_usage(cluster, "information_schema.sql_identifier", active_db, conn, script);
 }
 
 
-- 
2.34.1

Daniel Gustafsson

daniel@yesql.se

almost 3 years ago

In reply to: Nathan Bossart (#3)

2 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 18 Feb 2023, at 06:46, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Feb 17, 2023 at 10:44:49PM +0100, Daniel Gustafsson wrote:

When adding a check to pg_upgrade a while back I noticed in a profile that the
cluster compatibility check phase spend a lot of time in connectToServer. Some
of this can be attributed to data type checks which each run serially in turn
connecting to each database to run the check, and this seemed like a place
where we can do better.

The attached patch moves the checks from individual functions, which each loops
over all databases, into a struct which is consumed by a single umbrella check
where all data type queries are executed against a database using the same
connection. This way we can amortize the connectToServer overhead across more
accesses to the database.

This change consolidates all the data type checks, so instead of 7 separate
loops through all the databases, there is just one. However, I wonder if
we are leaving too much on the table, as there are a number of other
functions that also loop over all the databases:

* get_loadable_libraries
* get_db_and_rel_infos
* report_extension_updates
* old_9_6_invalidate_hash_indexes
* check_for_isn_and_int8_passing_mismatch
* check_for_user_defined_postfix_ops
* check_for_incompatible_polymorphics
* check_for_tables_with_oids
* check_for_user_defined_encoding_conversions

I suspect consolidating get_loadable_libraries, get_db_and_rel_infos, and
report_extension_updates would be prohibitively complicated and not worth
the effort.

Agreed, the added complexity of the code seems hard to justify unless there are
actual reports of problems.

I did experiment with reducing the allocations of namespaces and tablespaces
with a hashtable, see the attached WIP diff. There is no measurable difference
in speed, but a synthetic benchmark where allocations cannot be reused shows
reduced memory pressure. This might help on very large schemas, but it's not
worth pursuing IMO.

old_9_6_invalidate_hash_indexes is only needed for unsupported
versions, so that might not be worth consolidating.
check_for_isn_and_int8_passing_mismatch only loops through all databases
when float8_pass_by_value in the control data differs, so that might not be
worth it, either.

Yeah, these two aren't all that interesting to spend cycles on IMO.

The last 4 are for supported versions and, from a very
quick glance, seem possible to consolidate. That would bring us to a total
of 11 separate loops that we could consolidate into one. However, the data
type checks seem to follow a nice pattern, so perhaps this is easier said
than done.

There is that, refactoring the data type checks leads to removal of duplicated
code and a slight performance improvement. Refactoring the other checks to
reduce overhead would be an interesting thing to look at, but this point in the
v16 cycle might not be ideal for that.

IIUC with the patch, pg_upgrade will immediately fail as soon as a single
check in a database fails. I believe this differs from the current
behavior where all matches for a given check in the cluster are logged
before failing.

Yeah, that's wrong. Fixed.

I wonder if it'd be better to perform all of the data type
checks in all databases before failing so that all of the violations are
reported. Else, users would have to run pg_upgrade, fix a violation, run
pg_upgrade again, fix another one, etc.

I think that's better, and have changed the patch to do it that way.

One change this brings is that check.c contains version specific checks in the
struct. Previously these were mostly contained in version.c (some, like the
9.4 jsonb check was in check.c) which maintained some level of separation.
Splitting the array init is of course one option but it also seems a tad messy.
Not sure what's best, but for now I've documented it in the array comment at
least.

This version also moves the main data types check to check.c, renames some
members in the struct, moves to named initializers (as commented on by Justin
downthread), and adds some more polish here and there.

--
Daniel Gustafsson

Attachments:

nsphash.diffapplication/octet-stream; name=nsphash.diff; x-unix-mode=0644Download

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c1399c09b9..5c47e4463c 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -11,6 +11,8 @@
 
 #include "access/transam.h"
 #include "catalog/pg_class_d.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
 #include "pg_upgrade.h"
 
 static void create_rel_filename_map(const char *old_data, const char *new_data,
@@ -26,6 +28,28 @@ static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
 
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX		spacehash
+#define SH_ELEMENT_TYPE	space_entry_t
+#define SH_KEY_TYPE		const char *
+#define SH_KEY			name
+#define SH_HASH_KEY(tb, key)	hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
+#define SH_SCOPE		extern
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Helper function for spacehash hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+	unsigned char *ss = (unsigned char *) s;
+
+	return hash_bytes(ss, strlen(s));
+}
 
 /*
  * gen_db_file_maps()
@@ -402,11 +426,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 				i_relfilenumber,
 				i_reltablespace;
 	char		query[QUERY_ALLOC];
-	char	   *last_namespace = NULL,
-			   *last_tablespace = NULL;
 
 	query[0] = '\0';			/* initialize query string to empty */
 
+	/* TODO: What is a good initial size? */
+	dbinfo->db_namespaces = spacehash_create(16, NULL);
+	dbinfo->db_tablespaces = spacehash_create(16, NULL);
+
 	/*
 	 * Create a CTE that collects OIDs of regular user tables and matviews,
 	 * but excluding toast tables and indexes.  We assume that relations with
@@ -501,33 +527,29 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	for (relnum = 0; relnum < ntups; relnum++)
 	{
 		RelInfo    *curr = &relinfos[num_rels++];
+		bool 		found;
+		space_entry_t *sp;
 
 		curr->reloid = atooid(PQgetvalue(res, relnum, i_reloid));
 		curr->indtable = atooid(PQgetvalue(res, relnum, i_indtable));
 		curr->toastheap = atooid(PQgetvalue(res, relnum, i_toastheap));
 
 		nspname = PQgetvalue(res, relnum, i_nspname);
-		curr->nsp_alloc = false;
 
 		/*
 		 * Many of the namespace and tablespace strings are identical, so we
-		 * try to reuse the allocated string pointers where possible to reduce
-		 * memory consumption.
+		 * try to minimize allocated space by storing the names in a hash-
+		 * table from which they can be referenced.
 		 */
-		/* Can we reuse the previous string allocation? */
-		if (last_namespace && strcmp(nspname, last_namespace) == 0)
-			curr->nspname = last_namespace;
-		else
-		{
-			last_namespace = curr->nspname = pg_strdup(nspname);
-			curr->nsp_alloc = true;
-		}
+		sp = spacehash_insert(dbinfo->db_namespaces, nspname, &found);
+		if (!found)
+			sp->name = pstrdup(nspname);
+		curr->nspname = sp->name;
 
 		relname = PQgetvalue(res, relnum, i_relname);
 		curr->relname = pg_strdup(relname);
 
 		curr->relfilenumber = atooid(PQgetvalue(res, relnum, i_relfilenumber));
-		curr->tblsp_alloc = false;
 
 		/* Is the tablespace oid non-default? */
 		if (atooid(PQgetvalue(res, relnum, i_reltablespace)) != 0)
@@ -538,14 +560,10 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 			 */
 			tablespace = PQgetvalue(res, relnum, i_spclocation);
 
-			/* Can we reuse the previous string allocation? */
-			if (last_tablespace && strcmp(tablespace, last_tablespace) == 0)
-				curr->tablespace = last_tablespace;
-			else
-			{
-				last_tablespace = curr->tablespace = pg_strdup(tablespace);
-				curr->tblsp_alloc = true;
-			}
+			sp = spacehash_insert(dbinfo->db_tablespaces, tablespace, &found);
+			if (!found)
+				sp->name = pstrdup(tablespace);
+			curr->tablespace = sp->name;
 		}
 		else
 			/* A zero reltablespace oid indicates the database tablespace. */
@@ -569,6 +587,9 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
+
+		spacehash_destroy(db_arr->dbs[dbnum].db_namespaces);
+		spacehash_destroy(db_arr->dbs[dbnum].db_tablespaces);
 	}
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
@@ -582,13 +603,7 @@ free_rel_infos(RelInfoArr *rel_arr)
 	int			relnum;
 
 	for (relnum = 0; relnum < rel_arr->nrels; relnum++)
-	{
-		if (rel_arr->rels[relnum].nsp_alloc)
-			pg_free(rel_arr->rels[relnum].nspname);
 		pg_free(rel_arr->rels[relnum].relname);
-		if (rel_arr->rels[relnum].tblsp_alloc)
-			pg_free(rel_arr->rels[relnum].tablespace);
-	}
 	pg_free(rel_arr->rels);
 	rel_arr->nrels = 0;
 }
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 5f2a116f23..239c5c468a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -133,15 +133,13 @@ extern char *output_files[];
 typedef struct
 {
 	/* Can't use NAMEDATALEN; not guaranteed to be same on client */
-	char	   *nspname;		/* namespace name */
+	const char *nspname;		/* namespace name */
 	char	   *relname;		/* relation name */
 	Oid			reloid;			/* relation OID */
 	RelFileNumber relfilenumber;	/* relation file number */
 	Oid			indtable;		/* if index, OID of its table, else 0 */
 	Oid			toastheap;		/* if toast table, OID of base table, else 0 */
-	char	   *tablespace;		/* tablespace path; "" for cluster default */
-	bool		nsp_alloc;		/* should nspname be freed? */
-	bool		tblsp_alloc;	/* should tablespace be freed? */
+	const char *tablespace;		/* tablespace path; "" for cluster default */
 } RelInfo;
 
 typedef struct
@@ -162,10 +160,29 @@ typedef struct
 	Oid			db_oid;
 	RelFileNumber relfilenumber;
 	/* the rest are used only for logging and error reporting */
-	char	   *nspname;		/* namespaces */
+	const char *nspname;		/* namespaces */
 	char	   *relname;
 } FileNameMap;
 
+/*
+ * space_entry_t
+ *
+ * Hash table entries for recording namespace and tablespace names.
+ */
+typedef struct space_entry_t
+{
+	uint32		status;			/* hash status */
+	const char *name;
+} space_entry_t;
+
+#define SH_PREFIX		spacehash
+#define SH_ELEMENT_TYPE	space_entry_t
+#define SH_KEY_TYPE		const char *
+#define SH_SCOPE		extern
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
 /*
  * Structure to store database information
  */
@@ -181,6 +198,8 @@ typedef struct
 	char	   *db_iculocale;
 	int			db_encoding;
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	spacehash_hash  *db_namespaces;		/* hash of namespaces */
+	spacehash_hash  *db_tablespaces;	/* hash of tablespaces */
 } DbInfo;
 
 typedef struct

v2-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v2-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From fe151dd41642ca866a68e51a8c9de4da4e0cce9a Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 22 Feb 2023 10:32:31 +0100
Subject: [PATCH v2] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 573 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  27 +-
 src/bin/pg_upgrade/version.c    | 269 ++-------------
 3 files changed, 421 insertions(+), 448 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 7cf68dc9af..ca304a0101 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -26,15 +27,378 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+static bool check_for_aclitem_data_type_usage(ClusterInfo *cluster);
+static bool check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static char *get_canonical_locale_name(int category, const char *locale);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static int n_data_types_usage_checks = 7;
+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{.status = "Checking for system-defined composite types in user tables",
+	 .report_filename = "tables_using_composite.txt",
+	 .base_query =
+	 "SELECT t.oid FROM pg_catalog.pg_type t "
+	 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+	 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+	 .report_text =
+	 "Your installation contains system-defined composite type(s) in user tables.\n"
+	 "These type OIDs are not stable across PostgreSQL versions,\n"
+	 "so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * 9.3 -> 9.4
+	 *	Fully implement the 'line' data type in 9.4, which previously returned
+	 *	"not enabled" by default and was only functionally enabled with a
+	 *	compile-time switch; as of 9.4 "line" has a different on-disk
+	 *	representation format.
+	 */
+	{.status = "Checking for incompatible \"line\" data type",
+	 .report_filename = "tables_using_line.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "your installation contains the \"line\" data type in user tables.\n"
+	 "this data type changed its internal and input/output format\n"
+	 "between your old and new versions so this\n"
+	 "cluster cannot currently be upgraded.  you can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "a list of the problem columns is in the file:",
+	 .version_hook = old_9_3_check_for_line_data_type_usage},
+
+	/*
+	 *	pg_upgrade only preserves these system values:
+	 *		pg_class.oid
+	 *		pg_type.oid
+	 *		pg_enum.oid
+	 *
+	 *	Many of the reg* data types reference system catalog info that is
+	 *	not preserved, and hence these data types cannot be used in user
+	 *	tables upgraded by pg_upgrade.
+	 */
+	{.status = "Checking for reg* data types in user tables",
+	 .report_filename = "tables_using_reg.txt",
+	 /*
+	  * Note: older servers will not have all of these reg* types, so we have
+	  * to write the query like this rather than depending on casts to regtype.
+	  */
+	 .base_query =
+	 "SELECT oid FROM pg_catalog.pg_type t "
+	 "WHERE t.typnamespace = "
+	 "        (SELECT oid FROM pg_catalog.pg_namespace "
+	 "         WHERE nspname = 'pg_catalog') "
+	 "  AND t.typname IN ( "
+	 /* pg_class.oid is preserved, so 'regclass' is OK */
+	 "           'regcollation', "
+	 "           'regconfig', "
+	 "           'regdictionary', "
+	 "           'regnamespace', "
+	 "           'regoper', "
+	 "           'regoperator', "
+	 "           'regproc', "
+	 "           'regprocedure' "
+	 /* pg_authid.oid is preserved, so 'regrole' is OK */
+	 /* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+	 "         )",
+	 .report_text =
+	 "Your installation contains one of the reg* data types in user tables.\n"
+	 "These data types reference system OIDs that are not preserved by\n"
+	 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	{.status = "Checking for incompatible aclitem data type in user tables",
+	 .report_filename = "tables_using_aclitem.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"aclitem\" data type in user tables.\n"
+	 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+	 "so this cluster cannot currently be upgraded.  You can drop the\n"
+	 "problem columns and restart the upgrade.  A list of the problem\n"
+	 "columns is in the file:",
+	 .version_hook = check_for_aclitem_data_type_usage},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that
+	 * by hand.  However, if there's a table with such a column, the DDL
+	 * reload will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the
+	 * DDL reload will silently change it to "text" which won't match the
+	 * on-disk storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{.status = "Checking for invalid \"unknown\" user columns",
+	 .report_filename = "tables_using_unknown.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"unknown\" data type in user tables.\n"
+	 "This data type is no longer allowed in tables, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_9_6_check_for_unknown_data_type_usage},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 * In 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{.status = "Checking for invalid \"sql_identifier\" user columns",
+	 .report_filename = "tables_using_sql_identifier.txt",
+	 .base_query =
+	 "SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"sql_identifier\" data type in user tables.\n"
+	 "The on-disk format for this data type has changed, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_11_check_for_sql_identifier_data_type_usage},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{.status = "Checking for incompatible \"jsonb\" data type",
+	 .report_filename = "tables_using_jsonb.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"jsonb\" data type in user tables.\n"
+	 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = check_for_jsonb_9_4_usage},
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool	found = false;
+	bool   *results;
+	PQExpBufferData report;
+
+	prep_status("Checking for data type usage");
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult *res;
+			int 	ntups;
+			int 	i_nspname;
+			int 	i_relname;
+			int 	i_attname;
+			FILE   *script = NULL;
+			bool 	db_used = false;
+			char	output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+			{
+				cur_check++;
+				continue;
+			}
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested (to
+			 * varying extents depending on server version, but that's not of
+			 * concern here).  To handle all these cases we need a recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+							  "WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+							  "	%s "
+					  "	UNION ALL "
+					  "	SELECT * FROM ( "
+	/* inner WITH because we can only reference the CTE once */
+					  "		WITH x AS (SELECT oid FROM oids) "
+	/* domains on any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+					  "			UNION ALL "
+	/* arrays over any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+					  "			UNION ALL "
+	/* composite types containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+					  "			WHERE t.typtype = 'c' AND "
+					  "				  t.oid = c.reltype AND "
+					  "				  c.oid = a.attrelid AND "
+					  "				  NOT a.attisdropped AND "
+					  "				  a.atttypid = x.oid "
+					  "			UNION ALL "
+	/* ranges containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+					  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+					  "	) foo "
+					  ") "
+	/* now look for stored columns of any such type */
+					  "SELECT n.nspname, c.relname, a.attname "
+					  "FROM	pg_catalog.pg_class c, "
+					  "		pg_catalog.pg_namespace n, "
+					  "		pg_catalog.pg_attribute a "
+					  "WHERE	c.oid = a.attrelid AND "
+					  "		NOT a.attisdropped AND "
+					  "		a.atttypid IN (SELECT oid FROM oids) AND "
+					  "		c.relkind IN ("
+					  CppAsString2(RELKIND_RELATION) ", "
+					  CppAsString2(RELKIND_MATVIEW) ", "
+					  CppAsString2(RELKIND_INDEX) ") AND "
+					  "		c.relnamespace = n.oid AND "
+	/* exclude possible orphaned temp tables */
+					  "		n.nspname !~ '^pg_temp_' AND "
+					  "		n.nspname !~ '^pg_toast_temp_' AND "
+	/* exclude system catalogs, too */
+					  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+					  cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_STATUS, "failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, " %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+			cur_check++;
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -104,16 +468,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
-	 * format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -145,21 +502,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -168,14 +516,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1207,182 +1547,37 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_aclitem_data_type_usage
  *
- *	aclitem changed its storage format in 16, so check for it.
+ *     aclitem changed its storage format in 16, so check for it.
  */
-static void
+static bool
 check_for_aclitem_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible aclitem data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
  * check_for_jsonb_9_4_usage()
  *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
+ *     JSONB changed its storage format during 9.4 beta, so check for it.
  */
-static void
+static bool
 check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 5f2a116f23..fcee6cee1f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -320,6 +320,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck)(ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+} DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -442,18 +457,12 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..ccf078f13d 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,26 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
-bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
-{
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
-}
-
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
-{
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
 old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
+bool
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
 /*
@@ -353,41 +143,20 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
+bool
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
 }
 
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Daniel Gustafsson (#5)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Wed, Feb 22, 2023 at 10:37:35AM +0100, Daniel Gustafsson wrote:

On 18 Feb 2023, at 06:46, Nathan Bossart <nathandbossart@gmail.com> wrote:
The last 4 are for supported versions and, from a very
quick glance, seem possible to consolidate. That would bring us to a total
of 11 separate loops that we could consolidate into one. However, the data
type checks seem to follow a nice pattern, so perhaps this is easier said
than done.

There is that, refactoring the data type checks leads to removal of duplicated
code and a slight performance improvement. Refactoring the other checks to
reduce overhead would be an interesting thing to look at, but this point in the
v16 cycle might not be ideal for that.

Makes sense.

I wonder if it'd be better to perform all of the data type
checks in all databases before failing so that all of the violations are
reported. Else, users would have to run pg_upgrade, fix a violation, run
pg_upgrade again, fix another one, etc.

I think that's better, and have changed the patch to do it that way.

Thanks. This seems to work as intended. One thing I noticed is that the
"failed check" log is only printed once, even if multiple data type checks
failed. I believe this is because this message uses PG_STATUS. If I
change it to PG_REPORT, all of the "failed check" messages appear. TBH I'm
not sure we need this message at all since a more detailed explanation will
be printed afterwards. If we do keep it around, I think it should be
indented so that it looks more like this:

Checking for data type usage checking all databases
failed check: incompatible aclitem data type in user tables
failed check: reg* data types in user tables

One change this brings is that check.c contains version specific checks in the
struct. Previously these were mostly contained in version.c (some, like the
9.4 jsonb check was in check.c) which maintained some level of separation.
Splitting the array init is of course one option but it also seems a tad messy.
Not sure what's best, but for now I've documented it in the array comment at
least.

Hm. We could move check_for_aclitem_data_type_usage() and
check_for_jsonb_9_4_usage() to version.c since those are only used for
determining whether the check applies now. Otherwise, IMO things are in
roughly the right place. I don't think it's necessary to split the array.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Daniel Gustafsson

daniel@yesql.se

almost 3 years ago

In reply to: Nathan Bossart (#6)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 22 Feb 2023, at 20:20, Nathan Bossart <nathandbossart@gmail.com> wrote:

One thing I noticed is that the
"failed check" log is only printed once, even if multiple data type checks
failed. I believe this is because this message uses PG_STATUS. If I
change it to PG_REPORT, all of the "failed check" messages appear. TBH I'm
not sure we need this message at all since a more detailed explanation will
be printed afterwards. If we do keep it around, I think it should be
indented so that it looks more like this:

Checking for data type usage checking all databases
failed check: incompatible aclitem data type in user tables
failed check: reg* data types in user tables

Thats a good point, that's better. I think it makes sense to keep it around.

One change this brings is that check.c contains version specific checks in the
struct. Previously these were mostly contained in version.c (some, like the
9.4 jsonb check was in check.c) which maintained some level of separation.
Splitting the array init is of course one option but it also seems a tad messy.
Not sure what's best, but for now I've documented it in the array comment at
least.

Hm. We could move check_for_aclitem_data_type_usage() and
check_for_jsonb_9_4_usage() to version.c since those are only used for
determining whether the check applies now. Otherwise, IMO things are in
roughly the right place. I don't think it's necessary to split the array.

Will do, thanks.

--
Daniel Gustafsson

Daniel Gustafsson

daniel@yesql.se

almost 3 years ago

In reply to: Daniel Gustafsson (#7)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 23 Feb 2023, at 15:12, Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Feb 2023, at 20:20, Nathan Bossart <nathandbossart@gmail.com> wrote:

One thing I noticed is that the
"failed check" log is only printed once, even if multiple data type checks
failed. I believe this is because this message uses PG_STATUS. If I
change it to PG_REPORT, all of the "failed check" messages appear. TBH I'm
not sure we need this message at all since a more detailed explanation will
be printed afterwards. If we do keep it around, I think it should be
indented so that it looks more like this:

Checking for data type usage checking all databases
failed check: incompatible aclitem data type in user tables
failed check: reg* data types in user tables

Thats a good point, that's better. I think it makes sense to keep it around.

One change this brings is that check.c contains version specific checks in the
struct. Previously these were mostly contained in version.c (some, like the
9.4 jsonb check was in check.c) which maintained some level of separation.
Splitting the array init is of course one option but it also seems a tad messy.
Not sure what's best, but for now I've documented it in the array comment at
least.

Hm. We could move check_for_aclitem_data_type_usage() and
check_for_jsonb_9_4_usage() to version.c since those are only used for
determining whether the check applies now. Otherwise, IMO things are in
roughly the right place. I don't think it's necessary to split the array.

Will do, thanks.

The attached v3 is a rebase to handle conflicts and with the above comments
adressed.

--
Daniel Gustafsson

Attachments:

v3-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v3-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From ec6d973c9570fac4bdc3ba58ba4d62914dd7e17f Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 13 Mar 2023 14:46:24 +0100
Subject: [PATCH v3] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 575 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 289 +++-------------
 3 files changed, 433 insertions(+), 460 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index b71b00be37..77818f287f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,375 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static int n_data_types_usage_checks = 7;
+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{.status = "Checking for system-defined composite types in user tables",
+	 .report_filename = "tables_using_composite.txt",
+	 .base_query =
+	 "SELECT t.oid FROM pg_catalog.pg_type t "
+	 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+	 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+	 .report_text =
+	 "Your installation contains system-defined composite type(s) in user tables.\n"
+	 "These type OIDs are not stable across PostgreSQL versions,\n"
+	 "so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * 9.3 -> 9.4
+	 *	Fully implement the 'line' data type in 9.4, which previously returned
+	 *	"not enabled" by default and was only functionally enabled with a
+	 *	compile-time switch; as of 9.4 "line" has a different on-disk
+	 *	representation format.
+	 */
+	{.status = "Checking for incompatible \"line\" data type",
+	 .report_filename = "tables_using_line.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "your installation contains the \"line\" data type in user tables.\n"
+	 "this data type changed its internal and input/output format\n"
+	 "between your old and new versions so this\n"
+	 "cluster cannot currently be upgraded.  you can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "a list of the problem columns is in the file:",
+	 .version_hook = old_9_3_check_for_line_data_type_usage},
+
+	/*
+	 *	pg_upgrade only preserves these system values:
+	 *		pg_class.oid
+	 *		pg_type.oid
+	 *		pg_enum.oid
+	 *
+	 *	Many of the reg* data types reference system catalog info that is
+	 *	not preserved, and hence these data types cannot be used in user
+	 *	tables upgraded by pg_upgrade.
+	 */
+	{.status = "Checking for reg* data types in user tables",
+	 .report_filename = "tables_using_reg.txt",
+	 /*
+	  * Note: older servers will not have all of these reg* types, so we have
+	  * to write the query like this rather than depending on casts to regtype.
+	  */
+	 .base_query =
+	 "SELECT oid FROM pg_catalog.pg_type t "
+	 "WHERE t.typnamespace = "
+	 "        (SELECT oid FROM pg_catalog.pg_namespace "
+	 "         WHERE nspname = 'pg_catalog') "
+	 "  AND t.typname IN ( "
+	 /* pg_class.oid is preserved, so 'regclass' is OK */
+	 "           'regcollation', "
+	 "           'regconfig', "
+	 "           'regdictionary', "
+	 "           'regnamespace', "
+	 "           'regoper', "
+	 "           'regoperator', "
+	 "           'regproc', "
+	 "           'regprocedure' "
+	 /* pg_authid.oid is preserved, so 'regrole' is OK */
+	 /* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+	 "         )",
+	 .report_text =
+	 "Your installation contains one of the reg* data types in user tables.\n"
+	 "These data types reference system OIDs that are not preserved by\n"
+	 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	{.status = "Checking for incompatible aclitem data type in user tables",
+	 .report_filename = "tables_using_aclitem.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"aclitem\" data type in user tables.\n"
+	 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+	 "so this cluster cannot currently be upgraded.  You can drop the\n"
+	 "problem columns and restart the upgrade.  A list of the problem\n"
+	 "columns is in the file:",
+	 .version_hook = check_for_aclitem_data_type_usage},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that
+	 * by hand.  However, if there's a table with such a column, the DDL
+	 * reload will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the
+	 * DDL reload will silently change it to "text" which won't match the
+	 * on-disk storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{.status = "Checking for invalid \"unknown\" user columns",
+	 .report_filename = "tables_using_unknown.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"unknown\" data type in user tables.\n"
+	 "This data type is no longer allowed in tables, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_9_6_check_for_unknown_data_type_usage},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 * In 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{.status = "Checking for invalid \"sql_identifier\" user columns",
+	 .report_filename = "tables_using_sql_identifier.txt",
+	 .base_query =
+	 "SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"sql_identifier\" data type in user tables.\n"
+	 "The on-disk format for this data type has changed, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_11_check_for_sql_identifier_data_type_usage},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{.status = "Checking for incompatible \"jsonb\" data type",
+	 .report_filename = "tables_using_jsonb.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"jsonb\" data type in user tables.\n"
+	 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = check_for_jsonb_9_4_usage},
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool	found = false;
+	bool   *results;
+	PQExpBufferData report;
+
+	prep_status("Checking for data type usage");
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult *res;
+			int 	ntups;
+			int 	i_nspname;
+			int 	i_relname;
+			int 	i_attname;
+			FILE   *script = NULL;
+			bool 	db_used = false;
+			char	output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+			{
+				cur_check++;
+				continue;
+			}
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested (to
+			 * varying extents depending on server version, but that's not of
+			 * concern here).  To handle all these cases we need a recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+							  "WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+							  "	%s "
+					  "	UNION ALL "
+					  "	SELECT * FROM ( "
+	/* inner WITH because we can only reference the CTE once */
+					  "		WITH x AS (SELECT oid FROM oids) "
+	/* domains on any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+					  "			UNION ALL "
+	/* arrays over any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+					  "			UNION ALL "
+	/* composite types containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+					  "			WHERE t.typtype = 'c' AND "
+					  "				  t.oid = c.reltype AND "
+					  "				  c.oid = a.attrelid AND "
+					  "				  NOT a.attisdropped AND "
+					  "				  a.atttypid = x.oid "
+					  "			UNION ALL "
+	/* ranges containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+					  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+					  "	) foo "
+					  ") "
+	/* now look for stored columns of any such type */
+					  "SELECT n.nspname, c.relname, a.attname "
+					  "FROM	pg_catalog.pg_class c, "
+					  "		pg_catalog.pg_namespace n, "
+					  "		pg_catalog.pg_attribute a "
+					  "WHERE	c.oid = a.attrelid AND "
+					  "		NOT a.attisdropped AND "
+					  "		a.atttypid IN (SELECT oid FROM oids) AND "
+					  "		c.relkind IN ("
+					  CppAsString2(RELKIND_RELATION) ", "
+					  CppAsString2(RELKIND_MATVIEW) ", "
+					  CppAsString2(RELKIND_INDEX) ") AND "
+					  "		c.relnamespace = n.oid AND "
+	/* exclude possible orphaned temp tables */
+					  "		n.nspname !~ '^pg_temp_' AND "
+					  "		n.nspname !~ '^pg_toast_temp_' AND "
+	/* exclude system catalogs, too */
+					  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+					  cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, " %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+			cur_check++;
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +462,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
-	 * format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +496,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +510,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1422,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible aclitem data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..208bfbb68e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck)(ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+} DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,19 +465,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
+bool		check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
+bool		check_for_aclitem_data_type_usage(ClusterInfo *cluster);
 
 /* parallel.c */
 void		parallel_exec_prog(const char *log_file, const char *opt_log_file,
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..828a975ac0 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,41 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	return found;
+	return false;
 }
 
 /*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
+ * check_for_jsonb_9_4_usage()
  *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
+ *     JSONB changed its storage format during 9.4 beta, so check for it.
  */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
-{
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
+bool
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
 /*
@@ -353,41 +158,20 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
+bool
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
 }
 
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
@@ -459,3 +243,22 @@ report_extension_updates(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_for_aclitem_data_type_usage
+ *
+ *     aclitem changed its storage format in 16, so check for it.
+ */
+bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
+
-- 
2.32.1 (Apple Git-133)

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Daniel Gustafsson (#8)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Mon, Mar 13, 2023 at 03:10:58PM +0100, Daniel Gustafsson wrote:

The attached v3 is a rebase to handle conflicts and with the above comments
adressed.

Thanks for the new version of the patch.

I noticed that git-am complained when I applied the patch:

Applying: pg_upgrade: run all data type checks per connection
.git/rebase-apply/patch:1023: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;

It looks like "found" is set unconditionally a few lines above, so I think
this is redundant.

Also, I think it would be worth breaking check_for_data_types_usage() into
a few separate functions (or doing some other similar refactoring) to
improve readability. At this point, the function is quite lengthy, and I
count 6 levels of indentation at some lines.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#10

Nathan Bossart

nathandbossart@gmail.com

over 2 years ago

In reply to: Nathan Bossart (#9)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

I put together a rebased version of the patch for cfbot.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchtext/x-diff; charset=us-asciiDownload

From ee5805dc450f081b77ae3a7df315ceafb6ccc5e1 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 13 Mar 2023 14:46:24 +0100
Subject: [PATCH v4 1/1] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 575 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 289 +++-------------
 3 files changed, 433 insertions(+), 460 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..c829aed26e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,375 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static int n_data_types_usage_checks = 7;
+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{.status = "Checking for system-defined composite types in user tables",
+	 .report_filename = "tables_using_composite.txt",
+	 .base_query =
+	 "SELECT t.oid FROM pg_catalog.pg_type t "
+	 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+	 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+	 .report_text =
+	 "Your installation contains system-defined composite type(s) in user tables.\n"
+	 "These type OIDs are not stable across PostgreSQL versions,\n"
+	 "so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * 9.3 -> 9.4
+	 *	Fully implement the 'line' data type in 9.4, which previously returned
+	 *	"not enabled" by default and was only functionally enabled with a
+	 *	compile-time switch; as of 9.4 "line" has a different on-disk
+	 *	representation format.
+	 */
+	{.status = "Checking for incompatible \"line\" data type",
+	 .report_filename = "tables_using_line.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "your installation contains the \"line\" data type in user tables.\n"
+	 "this data type changed its internal and input/output format\n"
+	 "between your old and new versions so this\n"
+	 "cluster cannot currently be upgraded.  you can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "a list of the problem columns is in the file:",
+	 .version_hook = old_9_3_check_for_line_data_type_usage},
+
+	/*
+	 *	pg_upgrade only preserves these system values:
+	 *		pg_class.oid
+	 *		pg_type.oid
+	 *		pg_enum.oid
+	 *
+	 *	Many of the reg* data types reference system catalog info that is
+	 *	not preserved, and hence these data types cannot be used in user
+	 *	tables upgraded by pg_upgrade.
+	 */
+	{.status = "Checking for reg* data types in user tables",
+	 .report_filename = "tables_using_reg.txt",
+	 /*
+	  * Note: older servers will not have all of these reg* types, so we have
+	  * to write the query like this rather than depending on casts to regtype.
+	  */
+	 .base_query =
+	 "SELECT oid FROM pg_catalog.pg_type t "
+	 "WHERE t.typnamespace = "
+	 "        (SELECT oid FROM pg_catalog.pg_namespace "
+	 "         WHERE nspname = 'pg_catalog') "
+	 "  AND t.typname IN ( "
+	 /* pg_class.oid is preserved, so 'regclass' is OK */
+	 "           'regcollation', "
+	 "           'regconfig', "
+	 "           'regdictionary', "
+	 "           'regnamespace', "
+	 "           'regoper', "
+	 "           'regoperator', "
+	 "           'regproc', "
+	 "           'regprocedure' "
+	 /* pg_authid.oid is preserved, so 'regrole' is OK */
+	 /* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+	 "         )",
+	 .report_text =
+	 "Your installation contains one of the reg* data types in user tables.\n"
+	 "These data types reference system OIDs that are not preserved by\n"
+	 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	{.status = "Checking for incompatible aclitem data type in user tables",
+	 .report_filename = "tables_using_aclitem.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"aclitem\" data type in user tables.\n"
+	 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+	 "so this cluster cannot currently be upgraded.  You can drop the\n"
+	 "problem columns and restart the upgrade.  A list of the problem\n"
+	 "columns is in the file:",
+	 .version_hook = check_for_aclitem_data_type_usage},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that
+	 * by hand.  However, if there's a table with such a column, the DDL
+	 * reload will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the
+	 * DDL reload will silently change it to "text" which won't match the
+	 * on-disk storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{.status = "Checking for invalid \"unknown\" user columns",
+	 .report_filename = "tables_using_unknown.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"unknown\" data type in user tables.\n"
+	 "This data type is no longer allowed in tables, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_9_6_check_for_unknown_data_type_usage},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 * In 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{.status = "Checking for invalid \"sql_identifier\" user columns",
+	 .report_filename = "tables_using_sql_identifier.txt",
+	 .base_query =
+	 "SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"sql_identifier\" data type in user tables.\n"
+	 "The on-disk format for this data type has changed, so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = old_11_check_for_sql_identifier_data_type_usage},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{.status = "Checking for incompatible \"jsonb\" data type",
+	 .report_filename = "tables_using_jsonb.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "Your installation contains the \"jsonb\" data type in user tables.\n"
+	 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+	 "cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = check_for_jsonb_9_4_usage},
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool	found = false;
+	bool   *results;
+	PQExpBufferData report;
+
+	prep_status("Checking for data type usage");
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult *res;
+			int 	ntups;
+			int 	i_nspname;
+			int 	i_relname;
+			int 	i_attname;
+			FILE   *script = NULL;
+			bool 	db_used = false;
+			char	output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+			{
+				cur_check++;
+				continue;
+			}
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested (to
+			 * varying extents depending on server version, but that's not of
+			 * concern here).  To handle all these cases we need a recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+							  "WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+							  "	%s "
+					  "	UNION ALL "
+					  "	SELECT * FROM ( "
+	/* inner WITH because we can only reference the CTE once */
+					  "		WITH x AS (SELECT oid FROM oids) "
+	/* domains on any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+					  "			UNION ALL "
+	/* arrays over any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+					  "			UNION ALL "
+	/* composite types containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+					  "			WHERE t.typtype = 'c' AND "
+					  "				  t.oid = c.reltype AND "
+					  "				  c.oid = a.attrelid AND "
+					  "				  NOT a.attisdropped AND "
+					  "				  a.atttypid = x.oid "
+					  "			UNION ALL "
+	/* ranges containing any type selected so far */
+					  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+					  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+					  "	) foo "
+					  ") "
+	/* now look for stored columns of any such type */
+					  "SELECT n.nspname, c.relname, a.attname "
+					  "FROM	pg_catalog.pg_class c, "
+					  "		pg_catalog.pg_namespace n, "
+					  "		pg_catalog.pg_attribute a "
+					  "WHERE	c.oid = a.attrelid AND "
+					  "		NOT a.attisdropped AND "
+					  "		a.atttypid IN (SELECT oid FROM oids) AND "
+					  "		c.relkind IN ("
+					  CppAsString2(RELKIND_RELATION) ", "
+					  CppAsString2(RELKIND_MATVIEW) ", "
+					  CppAsString2(RELKIND_INDEX) ") AND "
+					  "		c.relnamespace = n.oid AND "
+	/* exclude possible orphaned temp tables */
+					  "		n.nspname !~ '^pg_temp_' AND "
+					  "		n.nspname !~ '^pg_toast_temp_' AND "
+	/* exclude system catalogs, too */
+					  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+					  cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, " %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+			cur_check++;
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +462,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +496,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +510,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1422,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..208bfbb68e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck)(ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+} DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,19 +465,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
+bool		check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
+bool		check_for_aclitem_data_type_usage(ClusterInfo *cluster);
 
 /* parallel.c */
 void		parallel_exec_prog(const char *log_file, const char *opt_log_file,
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..828a975ac0 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,41 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	return found;
+	return false;
 }
 
 /*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
+ * check_for_jsonb_9_4_usage()
  *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
+ *     JSONB changed its storage format during 9.4 beta, so check for it.
  */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
-{
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
+bool
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
 /*
@@ -353,41 +158,20 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
+bool
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
 }
 
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
@@ -459,3 +243,22 @@ report_extension_updates(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_for_aclitem_data_type_usage
+ *
+ *     aclitem changed its storage format in 16, so check for it.
+ */
+bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the on-disk
+	 * format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
+
-- 
2.25.1

#11

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Nathan Bossart (#10)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 4 Jul 2023, at 21:08, Nathan Bossart <nathandbossart@gmail.com> wrote:

I put together a rebased version of the patch for cfbot.

Thanks for doing that, much appreciated! I was busy looking at other peoples
patches and hadn't gotten to my own yet =)

On 13 Mar 2023, at 19:21, Nathan Bossart <nathandbossart@gmail.com> wrote:

I noticed that git-am complained when I applied the patch:

Applying: pg_upgrade: run all data type checks per connection
.git/rebase-apply/patch:1023: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

Fixed.

+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					found = true;
It looks like "found" is set unconditionally a few lines above, so I think
this is redundant.

Correct, this must've been a leftover from a previous coding that changed.
Removed.

Also, I think it would be worth breaking check_for_data_types_usage() into
a few separate functions (or doing some other similar refactoring) to
improve readability. At this point, the function is quite lengthy, and I
count 6 levels of indentation at some lines.

It it is pretty big for sure, but it's also IMHO not terribly complicated as
it's not really performing any hard to follow logic.

I have no issues refactoring it, but trying my hand at I was only making (what
I consider) less readable code by having to jump around so I consider it a
failure. If you have any suggestions, I would be more than happy to review and
incorporate those though.

Attached is a v5 with the above fixes and a pgindenting to fix up a few runaway
comments and indentations.

--
Daniel Gustafsson

Attachments:

v5-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v5-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From 084f64939eff769d6c9810e2c55dbcd942f79154 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 6 Jul 2023 17:55:45 +0200
Subject: [PATCH v5] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 589 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 288 +++-------------
 3 files changed, 446 insertions(+), 460 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..388555822f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,389 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static int	n_data_types_usage_checks = 7;
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = "Checking for system-defined composite types in user tables",
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite type(s) in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = "Checking for incompatible \"line\" data type",
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = old_9_3_check_for_line_data_type_usage
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = "Checking for reg* data types in user tables",
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = "Checking for incompatible aclitem data type in user tables",
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = check_for_aclitem_data_type_usage
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = "Checking for invalid \"unknown\" user columns",
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = old_9_6_check_for_unknown_data_type_usage
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = "Checking for invalid \"sql_identifier\" user columns",
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = old_11_check_for_sql_identifier_data_type_usage
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = "Checking for incompatible \"jsonb\" data type",
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = check_for_jsonb_9_4_usage
+	},
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+
+	prep_status("Checking for data type usage");
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+			{
+				cur_check++;
+				continue;
+			}
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, " %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+			cur_check++;
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +476,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +510,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +524,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1436,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..aeb2c60a89 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,19 +465,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
+bool		check_for_jsonb_9_4_usage(ClusterInfo *cluster);
+bool		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
+bool		check_for_aclitem_data_type_usage(ClusterInfo *cluster);
 
 /* parallel.c */
 void		parallel_exec_prog(const char *log_file, const char *opt_log_file,
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..b1c5628bd7 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,41 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
 /*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
+ * check_for_jsonb_9_4_usage()
  *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
+ *     JSONB changed its storage format during 9.4 beta, so check for it.
  */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
+bool
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
 /*
@@ -353,41 +158,20 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
+bool
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
 }
 
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
@@ -459,3 +243,21 @@ report_extension_updates(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_for_aclitem_data_type_usage
+ *
+ *     aclitem changed its storage format in 16, so check for it.
+ */
+bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
-- 
2.32.1 (Apple Git-133)

#12

Nathan Bossart

nathandbossart@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#11)

Re: Reducing connection overhead in pg_upgrade compat check phase

Thanks for the new patch.

On Thu, Jul 06, 2023 at 05:58:33PM +0200, Daniel Gustafsson wrote:

On 4 Jul 2023, at 21:08, Nathan Bossart <nathandbossart@gmail.com> wrote:
Also, I think it would be worth breaking check_for_data_types_usage() into
a few separate functions (or doing some other similar refactoring) to
improve readability. At this point, the function is quite lengthy, and I
count 6 levels of indentation at some lines.

It it is pretty big for sure, but it's also IMHO not terribly complicated as
it's not really performing any hard to follow logic.

I have no issues refactoring it, but trying my hand at I was only making (what
I consider) less readable code by having to jump around so I consider it a
failure. If you have any suggestions, I would be more than happy to review and
incorporate those though.

I don't have a strong opinion about this.

+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}

Since "script" will be NULL and "db_used" will be false in the first
iteration of the loop, couldn't we move this stuff to before the loop?

+					fprintf(script, " %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));

nitpick: І think the current code has two spaces at the beginning of this
format string. Did you mean to remove one of them?

+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}

Won't "script" always be initialized here? If I'm following this code
correctly, I think everything except the fclose() can be removed.

+ cur_check++;

I think this is unnecessary since we assign "cur_check" at the beginning of
every loop iteration. I see two of these.

+static int n_data_types_usage_checks = 7;

Can we determine this programmatically so that folks don't need to remember
to update it?

+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));

IMHO it's a little strange that this is initialized to all "true", only
because I think most other Postgres code does the opposite.

+bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster)

Do you think we should rename these functions to something like
"should_check_for_*"? They don't actually do the check, they just tell you
whether you should based on the version. In fact, I wonder if we could
just add the versions directly to data_types_usage_checks so that we don't
need the separate hook functions.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#13

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Nathan Bossart (#12)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 8 Jul 2023, at 23:43, Nathan Bossart <nathandbossart@gmail.com> wrote:

Thanks for reviewing!

Since "script" will be NULL and "db_used" will be false in the first
iteration of the loop, couldn't we move this stuff to before the loop?

We could. It's done this way to match how all the other checks are performing
the inner loop for consistency. I think being consistent is better than micro
optimizing in non-hot codepaths even though it adds some redundancy.

nitpick: І think the current code has two spaces at the beginning of this
format string. Did you mean to remove one of them?

Nice catch, I did not. Fixed.

Won't "script" always be initialized here? If I'm following this code
correctly, I think everything except the fclose() can be removed.

You are right that this check is superfluous. This is again an artifact of
modelling the code around how the other checks work for consistency. At least
I think that's a good characteristic of the code.

I think this is unnecessary since we assign "cur_check" at the beginning of
every loop iteration. I see two of these.

Right, this is a pointless leftover from a previous version which used a
while() loop, and I had missed removing them. Fixed.

+static int n_data_types_usage_checks = 7;

Can we determine this programmatically so that folks don't need to remember
to update it?

Fair point, I've added a counter loop to the beginning of the check function to
calculate it.

IMHO it's a little strange that this is initialized to all "true", only
because I think most other Postgres code does the opposite.

Agreed, but it made for a less contrived codepath in knowing when an error has
been seen already, to avoid duplicate error output, so I think it's worth it.

Do you think we should rename these functions to something like
"should_check_for_*"? They don't actually do the check, they just tell you
whether you should based on the version.

I've been pondering that too, and did a rename now along with moving them all
to a single place as well as changing the comments to make it clearer.

In fact, I wonder if we could just add the versions directly to
data_types_usage_checks so that we don't need the separate hook functions.

We could, but it would be sort of contrived I think since some check <= and
some == while some check the catversion as well (and new ones may have other
variants. I think this is the least paint-ourselves-in-a-corner version, if we
feel it's needlessly complicated and no other variants are added we can revisit
this.

--
Daniel Gustafsson

Attachments:

v6-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v6-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From 723fd82ce174426619dadcb9f2d8b81b3ff532ea Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 6 Jul 2023 17:55:45 +0200
Subject: [PATCH v6] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 595 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 288 +++-------------
 3 files changed, 450 insertions(+), 462 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..cc9aa3cec2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,395 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = "Checking for system-defined composite types in user tables",
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite type(s) in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = "Checking for incompatible \"line\" data type",
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = "Checking for reg* data types in user tables",
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = "Checking for incompatible aclitem data type in user tables",
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = "Checking for invalid \"unknown\" user columns",
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = "Checking for invalid \"sql_identifier\" user columns",
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = "Checking for incompatible \"jsonb\" data type",
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+		n_data_types_usage_checks++;
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +482,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +516,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +530,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1442,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..4d73ed8fb7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,18 +465,14 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..a51cb7eafa 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,69 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
 }
 
 /*
@@ -353,41 +186,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#14

Nathan Bossart

nathandbossart@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#13)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Mon, Jul 10, 2023 at 04:43:23PM +0200, Daniel Gustafsson wrote:

On 8 Jul 2023, at 23:43, Nathan Bossart <nathandbossart@gmail.com> wrote:
Since "script" will be NULL and "db_used" will be false in the first
iteration of the loop, couldn't we move this stuff to before the loop?

We could. It's done this way to match how all the other checks are performing
the inner loop for consistency. I think being consistent is better than micro
optimizing in non-hot codepaths even though it adds some redundancy.

[ ... ]

Won't "script" always be initialized here? If I'm following this code
correctly, I think everything except the fclose() can be removed.

You are right that this check is superfluous. This is again an artifact of
modelling the code around how the other checks work for consistency. At least
I think that's a good characteristic of the code.

I can't say I agree with this, but I'm not going to hold up the patch over
it. FWIW I was looking at this more from a code simplification/readability
standpoint.

+static int n_data_types_usage_checks = 7;

Can we determine this programmatically so that folks don't need to remember
to update it?

Fair point, I've added a counter loop to the beginning of the check function to
calculate it.

+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+		n_data_types_usage_checks++;

I think we need to tmp++ somewhere here.

In fact, I wonder if we could just add the versions directly to
data_types_usage_checks so that we don't need the separate hook functions.

We could, but it would be sort of contrived I think since some check <= and
some == while some check the catversion as well (and new ones may have other
variants. I think this is the least paint-ourselves-in-a-corner version, if we
feel it's needlessly complicated and no other variants are added we can revisit
this.

Makes sense.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Nathan Bossart (#14)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 11 Jul 2023, at 01:09, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jul 10, 2023 at 04:43:23PM +0200, Daniel Gustafsson wrote:

+static int n_data_types_usage_checks = 7;

Can we determine this programmatically so that folks don't need to remember
to update it?

Fair point, I've added a counter loop to the beginning of the check function to
calculate it.
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+		n_data_types_usage_checks++;
I think we need to tmp++ somewhere here.

Yuk, yes, will fix when caffeinated. Thanks.

--
Daniel Gustafsson

#16

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Daniel Gustafsson (#15)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 11 Jul 2023, at 01:26, Daniel Gustafsson <daniel@yesql.se> wrote:

On 11 Jul 2023, at 01:09, Nathan Bossart <nathandbossart@gmail.com> wrote:

I think we need to tmp++ somewhere here.

Yuk, yes, will fix when caffeinated. Thanks.

I did have coffee before now, but only found time to actually address this now
so here is a v7 with just that change and a fresh rebase.

--
Daniel Gustafsson

Attachments:

v7-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v7-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From 4ec6774b676e0e68d486e5a10e29b148fcfe509c Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 6 Jul 2023 17:55:45 +0200
Subject: [PATCH v7] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 598 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 288 +++------------
 3 files changed, 453 insertions(+), 462 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..815ff58540 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,398 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = "Checking for system-defined composite types in user tables",
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite type(s) in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = "Checking for incompatible \"line\" data type",
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = "Checking for reg* data types in user tables",
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = "Checking for incompatible aclitem data type in user tables",
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = "Checking for invalid \"unknown\" user columns",
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = "Checking for invalid \"sql_identifier\" user columns",
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = "Checking for incompatible \"jsonb\" data type",
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +485,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +519,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +533,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1445,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..4d73ed8fb7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,18 +465,14 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..a51cb7eafa 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,69 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
 }
 
 /*
@@ -353,41 +186,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#17

Nathan Bossart

nathandbossart@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#16)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Wed, Jul 12, 2023 at 12:43:14AM +0200, Daniel Gustafsson wrote:

I did have coffee before now, but only found time to actually address this now
so here is a v7 with just that change and a fresh rebase.

Thanks. I think the patch is in decent shape.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Nathan Bossart (#17)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 12 Jul 2023, at 01:36, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Jul 12, 2023 at 12:43:14AM +0200, Daniel Gustafsson wrote:

I did have coffee before now, but only found time to actually address this now
so here is a v7 with just that change and a fresh rebase.

Thanks. I think the patch is in decent shape.

Due to ENOTENOUGHTIME it bitrotted a bit, so here is a v8 rebase which I really
hope to close in this CF.

--
Daniel Gustafsson

Attachments:

v8-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v8-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From 37922039d3439f88b115f57d61ced98b32bbd953 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 31 Aug 2023 23:28:34 +0200
Subject: [PATCH v8] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 598 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 288 +++------------
 3 files changed, 453 insertions(+), 462 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..b548aebae3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,398 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = "Checking for system-defined composite types in user tables",
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite type(s) in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = "Checking for incompatible \"line\" data type",
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = "Checking for reg* data types in user tables",
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = "Checking for incompatible aclitem data type in user tables",
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = "Checking for invalid \"unknown\" user columns",
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = "Checking for invalid \"sql_identifier\" user columns",
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = "Checking for incompatible \"jsonb\" data type",
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	prep_status_progress("checking all databases");
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", cur_check->status);
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +485,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +519,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +533,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1445,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7afa96716e..83fe15efca 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -328,6 +328,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -450,18 +465,14 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..a51cb7eafa 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,69 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
 }
 
 /*
@@ -353,41 +186,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#19

Peter Eisentraut

peter@eisentraut.org

over 2 years ago

In reply to: Daniel Gustafsson (#18)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 31.08.23 23:34, Daniel Gustafsson wrote:

On 12 Jul 2023, at 01:36, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Jul 12, 2023 at 12:43:14AM +0200, Daniel Gustafsson wrote:

I did have coffee before now, but only found time to actually address this now
so here is a v7 with just that change and a fresh rebase.

Thanks. I think the patch is in decent shape.

Due to ENOTENOUGHTIME it bitrotted a bit, so here is a v8 rebase which I really
hope to close in this CF.

The alignment of this output looks a bit funny:

...
Checking for prepared transactions ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for data type usage checking all databases
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
...

Also, you should put gettext_noop() calls into the .status = "Checking ..."
assignments and arrange to call gettext() where they are used, to maintain
the translatability.

#20

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Peter Eisentraut (#19)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 13 Sep 2023, at 16:12, Peter Eisentraut <peter@eisentraut.org> wrote:

The alignment of this output looks a bit funny:

...
Checking for prepared transactions ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for data type usage checking all databases
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
...

I was using the progress reporting to indicate that it hadn't stalled for slow
systems, but it's not probably not all that important really. Removed such
that "ok" aligns.

Also, you should put gettext_noop() calls into the .status = "Checking ..."
assignments and arrange to call gettext() where they are used, to maintain
the translatability.

Ah, yes of course. Fixed.

--
Daniel Gustafsson

Attachments:

v9-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patchapplication/octet-stream; name=v9-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch; x-unix-mode=0644Download

From 05791b7f3ebee5706940dba146fc2733d8b71669 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 14 Sep 2023 10:30:47 +0200
Subject: [PATCH v9] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 596 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c    | 288 +++------------
 3 files changed, 451 insertions(+), 462 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 56e313f562..5422d424a4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,396 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite type(s) in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible aclitem data type in user tables"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -100,16 +483,9 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -141,21 +517,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -164,14 +531,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1084,185 +1443,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 		check_ok();
 }
 
-
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"aclitem\" data type in user tables");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..313d635198 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -329,6 +329,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -451,18 +466,14 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..a51cb7eafa 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,69 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
 }
 
 /*
@@ -353,41 +186,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#21

Daniel Gustafsson

daniel@yesql.se

about 2 years ago

In reply to: Daniel Gustafsson (#20)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

Attached is a v10 rebase of this patch which had undergone significant bitrot
due to recent changes in the pg_upgrade check phase. This brings in the
changes into the proposed structure without changes to queries, with no
additional changes to the proposed functionality.

Testing with a completely empty v11 cluster fresh from initdb as the old
cluster shows a significant speedup (averaged over multiple runs, adjusted for
outliers):

patched: 53.59ms (52.78ms, 52.49ms, 55.49ms)
master : 125.87ms (125.23 ms, 125.67ms, 126.67ms)

Using a similarly empty cluster from master as the old cluster shows a smaller
speedup, which is expected since many checks only run for older versions:

patched: 33.36ms (32.82ms, 33.78ms, 33.47ms)
master : 44.87ms (44.73ms, 44.90ms 44.99ms)

The latter case is still pretty interesting IMO since it can speed up testing
where every millisecond gained matters.

--
Daniel Gustafsson

Attachments:

v10-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v10-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From 08a69b7220279db98462e0b96dadeba4d227eba4 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Fri, 27 Oct 2023 13:44:28 +0200
Subject: [PATCH v10] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 687 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  30 +-
 src/bin/pg_upgrade/version.c    | 295 +++-----------
 3 files changed, 504 insertions(+), 508 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fa52aa2c22..d94e2fe401 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,19 +24,441 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite types in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+		.report_filename = "tables_using_abstime.txt",
+		.base_query =
+		"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+		.report_text =
+		"Your installation contains the \"abstime\" data type in user tables.\n"
+		"The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+		"so this cluster cannot currently be upgraded.  You can drop the\n"
+		"problem columns, or change them to another data type, and restart\n"
+		"the upgrade.  A list of the problem columns is in the file:\n",
+		.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+		.report_filename = "tables_using_reltime.txt",
+		.base_query =
+		"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+		.report_text =
+		"Your installation contains the \"reltime\" data type in user tables.\n"
+		"The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+		"so this cluster cannot currently be upgraded.  You can drop the\n"
+		"problem columns, or change them to another data type, and restart\n"
+		"the upgrade.  A list of the problem columns is in the file:\n",
+		.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+		.report_filename = "tables_using_tinterval.txt",
+		.base_query =
+		"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+		.report_text =
+		"Your installation contains the \"tinterval\" data type in user tables.\n"
+		"The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+		"so this cluster cannot currently be upgraded.  You can drop the\n"
+		"problem columns, or change them to another data type, and restart\n"
+		"the upgrade.  A list of the problem columns is in the file:\n",
+		.version_hook = removed_data_types_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks *checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+	memset(results, true, sizeof(*results));
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = false;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -108,8 +531,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	/*
@@ -119,22 +540,7 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
 		check_old_cluster_for_valid_slots(live_check);
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -166,21 +572,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -189,14 +586,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1112,220 +1501,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index a710f325de..f949d5de9b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -348,6 +348,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -471,18 +486,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
+bool		removed_data_types_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 403a6d7cfa..4382957b59 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,80 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	prep_status("Checking for invalid \"unknown\" user columns");
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+bool
+removed_data_types_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 12 removed abstime, reltime and tinterval */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -353,41 +197,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#22

vignesh C

vignesh21@gmail.com

almost 2 years ago

In reply to: Daniel Gustafsson (#21)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Fri, 27 Oct 2023 at 18:50, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v10 rebase of this patch which had undergone significant bitrot
due to recent changes in the pg_upgrade check phase. This brings in the
changes into the proposed structure without changes to queries, with no
additional changes to the proposed functionality.

Testing with a completely empty v11 cluster fresh from initdb as the old
cluster shows a significant speedup (averaged over multiple runs, adjusted for
outliers):

patched: 53.59ms (52.78ms, 52.49ms, 55.49ms)
master : 125.87ms (125.23 ms, 125.67ms, 126.67ms)

Using a similarly empty cluster from master as the old cluster shows a smaller
speedup, which is expected since many checks only run for older versions:

patched: 33.36ms (32.82ms, 33.78ms, 33.47ms)
master : 44.87ms (44.73ms, 44.90ms 44.99ms)

The latter case is still pretty interesting IMO since it can speed up testing
where every millisecond gained matters.

CFBot shows that the patch does not apply anymore as in [1]http://cfbot.cputube.org/patch_46_4200.log:
=== Applying patches on top of PostgreSQL commit ID
55627ba2d334ce98e1f5916354c46472d414bda6 ===
=== applying patch
./v10-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch
patching file src/bin/pg_upgrade/check.c
Hunk #2 FAILED at 24.
...
1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/check.c.rej

Please post an updated version for the same.

[1]: http://cfbot.cputube.org/patch_46_4200.log

Regards,
Vignesh

#23

vignesh C

vignesh21@gmail.com

almost 2 years ago

In reply to: vignesh C (#22)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Sat, 27 Jan 2024 at 09:10, vignesh C <vignesh21@gmail.com> wrote:

On Fri, 27 Oct 2023 at 18:50, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v10 rebase of this patch which had undergone significant bitrot
due to recent changes in the pg_upgrade check phase. This brings in the
changes into the proposed structure without changes to queries, with no
additional changes to the proposed functionality.

Testing with a completely empty v11 cluster fresh from initdb as the old
cluster shows a significant speedup (averaged over multiple runs, adjusted for
outliers):

patched: 53.59ms (52.78ms, 52.49ms, 55.49ms)
master : 125.87ms (125.23 ms, 125.67ms, 126.67ms)

Using a similarly empty cluster from master as the old cluster shows a smaller
speedup, which is expected since many checks only run for older versions:

patched: 33.36ms (32.82ms, 33.78ms, 33.47ms)
master : 44.87ms (44.73ms, 44.90ms 44.99ms)

The latter case is still pretty interesting IMO since it can speed up testing
where every millisecond gained matters.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
55627ba2d334ce98e1f5916354c46472d414bda6 ===
=== applying patch
./v10-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch
patching file src/bin/pg_upgrade/check.c
Hunk #2 FAILED at 24.
...
1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/check.c.rej

Please post an updated version for the same.

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh

#24

Nathan Bossart

nathandbossart@gmail.com

almost 2 years ago

In reply to: vignesh C (#23)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

IMHO this patch is worth trying to get into v17. I'd be happy to take it
forward if Daniel does not intend to work on it.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#25

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Nathan Bossart (#24)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 6 Feb 2024, at 17:32, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

IMHO this patch is worth trying to get into v17. I'd be happy to take it
forward if Daniel does not intend to work on it.

I actually had the same thought yesterday and spent some time polishing and
rebasing it. I'll post an updated rebase shortly with the hopes of getting it
committed this week.

--
Daniel Gustafsson

#26

Nathan Bossart

nathandbossart@gmail.com

almost 2 years ago

In reply to: Daniel Gustafsson (#25)

Re: Reducing connection overhead in pg_upgrade compat check phase

On Tue, Feb 06, 2024 at 05:47:56PM +0100, Daniel Gustafsson wrote:

On 6 Feb 2024, at 17:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
IMHO this patch is worth trying to get into v17. I'd be happy to take it
forward if Daniel does not intend to work on it.

I actually had the same thought yesterday and spent some time polishing and
rebasing it. I'll post an updated rebase shortly with the hopes of getting it
committed this week.

Oh, awesome. Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#27

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Daniel Gustafsson (#25)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 6 Feb 2024, at 17:47, Daniel Gustafsson <daniel@yesql.se> wrote:

On 6 Feb 2024, at 17:32, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

IMHO this patch is worth trying to get into v17. I'd be happy to take it
forward if Daniel does not intend to work on it.

I actually had the same thought yesterday and spent some time polishing and
rebasing it. I'll post an updated rebase shortly with the hopes of getting it
committed this week.

Attached is a v11 rebased over HEAD with some very minor tweaks. Unless there
are objections I plan to go ahead with this version this week.

--
Daniel Gustafsson

Attachments:

v11-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v11-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From 2909537daddd43231a13f73f13adec91974a4f90 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 7 Feb 2024 13:36:46 +0100
Subject: [PATCH v11] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 686 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  30 +-
 src/bin/pg_upgrade/version.c    | 296 +++-----------
 3 files changed, 504 insertions(+), 508 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e36a7328bf..566b49dbba 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,13 +24,6 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
@@ -38,6 +32,434 @@ static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			"Your installation contains system-defined composite types in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			"your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  you can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"a list of the problem columns is in the file:",
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			"Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:",
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:",
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+			.report_filename = "tables_using_abstime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"abstime\" data type in user tables.\n"
+			"The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:\n",
+			.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+			.report_filename = "tables_using_reltime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"reltime\" data type in user tables.\n"
+			"The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:\n",
+			.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+			.report_filename = "tables_using_tinterval.txt",
+			.base_query =
+			"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+			.report_text =
+			"Your installation contains the \"tinterval\" data type in user tables.\n"
+			"The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:\n",
+			.version_hook = removed_data_types_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc0(sizeof(bool) * n_data_types_usage_checks);
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (!results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  cur_check->report_text, output_path);
+				}
+				results[checknum] = true;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -110,8 +532,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
@@ -129,22 +549,7 @@ check_and_dump_old_cluster(bool live_check)
 		check_old_cluster_subscription_state();
 	}
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -176,21 +581,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -199,14 +595,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1124,220 +1512,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index d9a848cbfd..23b665527f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -351,6 +351,21 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
 /*
  * Global variables
  */
@@ -475,18 +490,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
+bool		removed_data_types_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 13b2c0f012..2e060782ec 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,81 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	prep_status("Checking for invalid \"unknown\" user columns");
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+bool
+removed_data_types_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 12 removed abstime, reltime and tinterval
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -353,41 +198,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#28

Peter Eisentraut

peter@eisentraut.org

almost 2 years ago

In reply to: Daniel Gustafsson (#27)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 07.02.24 14:25, Daniel Gustafsson wrote:

On 6 Feb 2024, at 17:47, Daniel Gustafsson <daniel@yesql.se> wrote:

On 6 Feb 2024, at 17:32, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

IMHO this patch is worth trying to get into v17. I'd be happy to take it
forward if Daniel does not intend to work on it.

I actually had the same thought yesterday and spent some time polishing and
rebasing it. I'll post an updated rebase shortly with the hopes of getting it
committed this week.

Attached is a v11 rebased over HEAD with some very minor tweaks. Unless there
are objections I plan to go ahead with this version this week.

A few more quick comments:

I think the .report_text assignments also need a gettext_noop(), like
the .status assignments.

The type DataTypesUsageChecks is only used in check.c, so doesn't need
to be in pg_upgrade.h.

Idea for further improvement: Might be nice if the
DataTypesUsageVersionCheck struct also included the applicable version
information, so the additional checks in version.c would no longer be
necessary.

#29

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Peter Eisentraut (#28)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 8 Feb 2024, at 11:55, Peter Eisentraut <peter@eisentraut.org> wrote:

A few more quick comments:

Thanks for reviewing!

I think the .report_text assignments also need a gettext_noop(), like the .status assignments.

Done in the attached.

The type DataTypesUsageChecks is only used in check.c, so doesn't need to be in pg_upgrade.h.

Fixed.

Idea for further improvement: Might be nice if the DataTypesUsageVersionCheck struct also included the applicable version information, so the additional checks in version.c would no longer be necessary.

I tried various variants of this when writing it, but since the checks aren't
just checking version but also include catalog version checks it became messy.
One option could perhaps be to include a version number for <= comparison, and
if set to zero a function pointer to a version check function must be provided?
That would handle the simple cases in a single place without messy logic, and
leave the more convoluted checks with a special case function.

--
Daniel Gustafsson

Attachments:

v12-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v12-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From c02fdf0e6dc29f53fef42dcead85c2079be597f6 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 7 Feb 2024 13:36:46 +0100
Subject: [PATCH v12] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 698 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  18 +-
 src/bin/pg_upgrade/version.c    | 296 +++-----------
 3 files changed, 504 insertions(+), 508 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e36a7328bf..1ac8587059 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,13 +24,6 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
@@ -38,6 +32,446 @@ static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
 
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			gettext_noop("Your installation contains system-defined composite types in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = NULL
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = line_type_check_applicable
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			gettext_noop("Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = NULL
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:"),
+			.version_hook = aclitem_type_check_applicable
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = unknown_type_check_applicable
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = sql_identifier_type_check_applicable
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+			.report_filename = "tables_using_abstime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"abstime\" data type in user tables.\n"
+			"The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+			.report_filename = "tables_using_reltime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"reltime\" data type in user tables.\n"
+			"The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.version_hook = removed_data_types_check_applicable
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+			.report_filename = "tables_using_tinterval.txt",
+			.base_query =
+			"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"tinterval\" data type in user tables.\n"
+			"The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.version_hook = removed_data_types_check_applicable
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc0(sizeof(bool) * n_data_types_usage_checks);
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			/*
+			 * Make sure that the check applies to the current cluster version
+			 * and skip if not. If no check hook has been defined we run the
+			 * check for all versions.
+			 */
+			if (cur_check->version_hook && !cur_check->version_hook(cluster))
+				continue;
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (!results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  _(cur_check->report_text), output_path);
+				}
+				results[checknum] = true;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -110,8 +544,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
@@ -129,22 +561,7 @@ check_and_dump_old_cluster(bool live_check)
 		check_old_cluster_subscription_state();
 	}
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -176,21 +593,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -199,14 +607,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1124,220 +1524,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index d9a848cbfd..494335ea93 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -351,6 +351,9 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
 /*
  * Global variables
  */
@@ -475,18 +478,15 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		line_type_check_applicable(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
+bool		unknown_type_check_applicable(ClusterInfo *cluster);
+bool		sql_identifier_type_check_applicable(ClusterInfo *cluster);
+bool		aclitem_type_check_applicable(ClusterInfo *cluster);
+bool		removed_data_types_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 13b2c0f012..2e060782ec 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,81 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+line_type_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
+	/* Pre-PG 9.4 had a different 'line' data type internal format */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 903)
+		return true;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
+	return false;
 }
 
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
+	return false;
 }
 
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+bool
+unknown_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	/* Pre-PG 10 allowed tables with 'unknown' type columns */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 906)
+		return true;
+	return false;
 }
 
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
+bool
+sql_identifier_type_check_applicable(ClusterInfo *cluster)
 {
-	char		output_path[MAXPGPATH];
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...).
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
+
+	return false;
+}
 
-	prep_status("Checking for invalid \"unknown\" user columns");
+bool
+aclitem_type_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1500)
+		return true;
+
+	return false;
+}
 
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
+bool
+removed_data_types_check_applicable(ClusterInfo *cluster)
+{
+	/*
+	 * PG 12 removed abstime, reltime and tinterval
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) <= 1100)
+		return true;
 
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -353,41 +198,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#30

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Daniel Gustafsson (#29)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 8 Feb 2024, at 15:16, Daniel Gustafsson <daniel@yesql.se> wrote:

One option could perhaps be to include a version number for <= comparison, and
if set to zero a function pointer to a version check function must be provided?
That would handle the simple cases in a single place without messy logic, and
leave the more convoluted checks with a special case function.

The attached is a draft version of this approach, each check can define to run
for all versions, set a threshold version for which it runs or define a
callback which implements a more complicated check.

--
Daniel Gustafsson

Attachments:

v13-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v13-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From 7e8ec96cf4ff291595482ae2226e82cbdf16662e Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 7 Feb 2024 13:36:46 +0100
Subject: [PATCH v13] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 715 ++++++++++++++++++++------------
 src/bin/pg_upgrade/pg_upgrade.h |  13 +-
 src/bin/pg_upgrade/version.c    | 266 +-----------
 3 files changed, 472 insertions(+), 522 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e36a7328bf..663e270f30 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,13 +24,6 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
@@ -38,6 +32,463 @@ static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
 
+#define MANUAL_CHECK 0
+#define ALL_VERSIONS -1
+
+/*
+ * DataTypesUsageChecks
+ */
+typedef struct
+{
+	const char *status;			/* status line to print to the user */
+	const char *report_filename;	/* filename to store report to */
+	const char *base_query;		/* Query to extract the oid of the datatype */
+	const char *report_text;	/* Text to store to report in case of error */
+	int threshold_version;
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			gettext_noop("Your installation contains system-defined composite types in user tables.\n"
+			"These type OIDs are not stable across PostgreSQL versions,\n"
+			"so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"line\" data type in user tables.\n"
+			"this data type changed its internal and input/output format\n"
+			"between your old and new versions so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = 903
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			gettext_noop("Your installation contains one of the reg* data types in user tables.\n"
+			"These data types reference system OIDs that are not preserved by\n"
+			"pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"aclitem\" data type in user tables.\n"
+			"The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns and restart the upgrade.  A list of the problem\n"
+			"columns is in the file:"),
+			.threshold_version = 1500
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"unknown\" data type in user tables.\n"
+			"This data type is no longer allowed in tables, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = 906
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"sql_identifier\" data type in user tables.\n"
+			"The on-disk format for this data type has changed, so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = 1100
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"jsonb\" data type in user tables.\n"
+			"The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+			"cluster cannot currently be upgraded.  You can\n"
+			"drop the problem columns and restart the upgrade.\n"
+			"A list of the problem columns is in the file:"),
+			.threshold_version = MANUAL_CHECK,
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+			.report_filename = "tables_using_abstime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"abstime\" data type in user tables.\n"
+			"The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+			.report_filename = "tables_using_reltime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"reltime\" data type in user tables.\n"
+			"The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+			.report_filename = "tables_using_tinterval.txt",
+			.base_query =
+			"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"tinterval\" data type in user tables.\n"
+			"The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+			"so this cluster cannot currently be upgraded.  You can drop the\n"
+			"problem columns, or change them to another data type, and restart\n"
+			"the upgrade.  A list of the problem columns is in the file:"),
+			.threshold_version = 1100
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, 0, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc0(sizeof(bool) * n_data_types_usage_checks);
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			if (cur_check->threshold_version == MANUAL_CHECK)
+			{
+				Assert(cur_check->version_hook);
+
+				/*
+				 * Make sure that the check applies to the current cluster version
+				 * and skip if not. If no check hook has been defined we run the
+				 * check for all versions.
+				 */
+				if (!cur_check->version_hook(cluster))
+					continue;
+			}
+			else if (cur_check->threshold_version != ALL_VERSIONS)
+			{
+				if (GET_MAJOR_VERSION(cluster->major_version) > cur_check->threshold_version)
+					continue;
+			}
+			else
+				Assert(cur_check->threshold_version == ALL_VERSIONS);
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (!results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n    %s\n",
+									  _(cur_check->report_text), output_path);
+				}
+				results[checknum] = true;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -110,8 +561,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
@@ -129,22 +578,7 @@ check_and_dump_old_cluster(bool live_check)
 		check_old_cluster_subscription_state();
 	}
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -176,21 +610,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -199,14 +624,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1124,220 +1541,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index d9a848cbfd..7daa67f809 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -351,6 +351,9 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
 /*
  * Global variables
  */
@@ -475,18 +478,10 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 13b2c0f012..e3c7b4109c 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,23 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
-bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
-{
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
-}
-
 /*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -353,41 +140,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#31

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Daniel Gustafsson (#30)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 9 Feb 2024, at 00:04, Daniel Gustafsson <daniel@yesql.se> wrote:

On 8 Feb 2024, at 15:16, Daniel Gustafsson <daniel@yesql.se> wrote:

One option could perhaps be to include a version number for <= comparison, and
if set to zero a function pointer to a version check function must be provided?
That would handle the simple cases in a single place without messy logic, and
leave the more convoluted checks with a special case function.

The attached is a draft version of this approach, each check can define to run
for all versions, set a threshold version for which it runs or define a
callback which implements a more complicated check.

And again pgindented and with documentation on the struct members to make it
easy to add new checks. A repetitive part of the report text was also moved to
a single place.

--
Daniel Gustafsson

Attachments:

v14-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v14-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From 2d9ec92dd3af34ab240364445a404fef2d46caa1 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 7 Feb 2024 13:36:46 +0100
Subject: [PATCH v14] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 750 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  13 +-
 src/bin/pg_upgrade/version.c    | 266 +----------
 3 files changed, 507 insertions(+), 522 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e36a7328bf..6d640d7bb0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,13 +24,6 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
@@ -38,6 +32,498 @@ static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
 
+/*
+ * DataTypesUsageChecks - definitions of data type checks for the old cluster
+ * in order to determine if an upgrade can be performed.  See the comment on
+ * data_types_usage_checks below for a more detailed description.
+ */
+typedef struct
+{
+	/* Status line to print to the user */
+	const char *status;
+	/* Filename to store report to */
+	const char *report_filename;
+	/* Query to extract the oid of the datatype */
+	const char *base_query;
+	/* Text to store to report in case of error */
+	const char *report_text;
+	/* The latest version where the check applies */
+	int			threshold_version;
+	/* A function pointer for determining if the check applies */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
+/*
+ * Special values for threshold_version for indicating that a check applies to
+ * all versions, or that a custom function needs to be invoked to determine
+ * if the check applies.
+ */
+#define MANUAL_CHECK 1
+#define ALL_VERSIONS -1
+
+/*--
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and functionality for deciding if the check is applicable to the version
+ * of the old cluster. The struct members are described in detail below:
+ *
+ * status				A oneline string which can be printed to the user to
+ *						inform about progress. Should not end with newline.
+ * report_filename		The filename in which the list of problems detected by
+ *						the check will be printed.
+ * base_query			A query which extracts the Oid of the datatype checked
+ *						for.
+ * report_text			The text which will be printed to the user to explain
+ *						what the check did, and why it failed. The text should
+ *						end with a newline, and does not need to refer to the
+ *						report_filename as that is automatically appended to
+ *						the report with the path to the log folder.
+ * threshold_version	The major version of PostgreSQL for which to run the
+ *						check. Iff the old cluster is less than, or equal to,
+ *						the threshold version then the check will be executed.
+ *						If the old version is greater than the threshold then
+ *						the check is skipped. If the threshold_version is set
+ *						to ALL_VERSIONS then it will be run unconditionally,
+ *						if set to MANUAL_CHECK then the version_hook function
+ *						will be executed in order to determine whether or not
+ *						to run.
+ * version_hook			A function pointer to a version check function of type
+ *						DataTypesUsageVersionCheck which is used to determine
+ *						if the check is applicable to the old cluster. If the
+ *						version_hook returns true then the check will be run,
+ *						else it will be skipped. The function will only be
+ *						executed iff threshold_version is set to MANUAL_CHECK.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			gettext_noop("Your installation contains system-defined composite types in user tables.\n"
+						 "These type OIDs are not stable across PostgreSQL versions,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns and restart the upgrade.\n"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"line\" data type in user tables.\n"
+						 "this data type changed its internal and input/output format\n"
+						 "between your old and new versions so this\n"
+						 "cluster cannot currently be upgraded.  You can\n"
+						 "drop the problem columns and restart the upgrade.\n"),
+			.threshold_version = 903
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			gettext_noop("Your installation contains one of the reg* data types in user tables.\n"
+						 "These data types reference system OIDs that are not preserved by\n"
+						 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+						 "drop the problem columns and restart the upgrade.\n"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"aclitem\" data type in user tables.\n"
+						 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns and restart the upgrade.\n"),
+			.threshold_version = 1500
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"unknown\" data type in user tables.\n"
+						 "This data type is no longer allowed in tables, so this cluster\n"
+						 "cannot currently be upgraded.  You can drop the problem columns\n"
+						 "and restart the upgrade.\n"),
+			.threshold_version = 906
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"sql_identifier\" data type in user tables.\n"
+						 "The on-disk format for this data type has changed, so this\n"
+						 "cluster cannot currently be upgraded.  You can drop the problem\n"
+						 "columns and restart the upgrade.\n"),
+			.threshold_version = 1100
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"jsonb\" data type in user tables.\n"
+						 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+						 "cluster cannot currently be upgraded.  You can drop the problem \n"
+						 "columns and restart the upgrade.\n"),
+			.threshold_version = MANUAL_CHECK,
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+			.report_filename = "tables_using_abstime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"abstime\" data type in user tables.\n"
+						 "The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+			.report_filename = "tables_using_reltime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"reltime\" data type in user tables.\n"
+						 "The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+			.report_filename = "tables_using_tinterval.txt",
+			.base_query =
+			"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"tinterval\" data type in user tables.\n"
+						 "The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, 0, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc0(sizeof(bool) * n_data_types_usage_checks);
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			if (cur_check->threshold_version == MANUAL_CHECK)
+			{
+				Assert(cur_check->version_hook);
+
+				/*
+				 * Make sure that the check applies to the current cluster
+				 * version and skip if not. If no check hook has been defined
+				 * we run the check for all versions.
+				 */
+				if (!cur_check->version_hook(cluster))
+					continue;
+			}
+			else if (cur_check->threshold_version != ALL_VERSIONS)
+			{
+				if (GET_MAJOR_VERSION(cluster->major_version) > cur_check->threshold_version)
+					continue;
+			}
+			else
+				Assert(cur_check->threshold_version == ALL_VERSIONS);
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (!results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n%s    %s\n",
+									  _(cur_check->report_text),
+									  _("A list of the problem columns is in the file:"),
+									  output_path);
+				}
+				results[checknum] = true;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %s",
+								 output_path,
+								 strerror(errno));
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -110,8 +596,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
@@ -129,22 +613,7 @@ check_and_dump_old_cluster(bool live_check)
 		check_old_cluster_subscription_state();
 	}
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -176,21 +645,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -199,14 +659,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1124,220 +1576,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index d9a848cbfd..7daa67f809 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -351,6 +351,9 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
 /*
  * Global variables
  */
@@ -475,18 +478,10 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 13b2c0f012..e3c7b4109c 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,236 +9,23 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
-/*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
- */
-bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
-{
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
-
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %s", output_path,
-						 strerror(errno));
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
-}
-
 /*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -353,41 +140,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#32

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Daniel Gustafsson (#31)

1 attachment(s)

Re: Reducing connection overhead in pg_upgrade compat check phase

Attached is a fresh rebase with only minor cosmetic touch-ups which I would
like to go ahead with during this CF.

Peter: does this address the comments you had on translation and code
duplication?

--
Daniel Gustafsson

Attachments:

v15-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patchapplication/octet-stream; name=v15-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch; x-unix-mode=0644Download

From 428da9b257e73ef01abe3f2ffff218066b0afdaa Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 18 Mar 2024 10:54:45 +0100
Subject: [PATCH v15] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On clusters which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize setup and
teardown overhead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
---
 src/bin/pg_upgrade/check.c      | 749 +++++++++++++++++++++-----------
 src/bin/pg_upgrade/pg_upgrade.h |  13 +-
 src/bin/pg_upgrade/version.c    | 265 +----------
 3 files changed, 506 insertions(+), 521 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 8ce6c674e3..c198896c9f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,13 +24,6 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_removed_data_type_usage(ClusterInfo *cluster,
-											  const char *version,
-											  const char *datatype);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
@@ -38,6 +32,497 @@ static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
 
+/*
+ * DataTypesUsageChecks - definitions of data type checks for the old cluster
+ * in order to determine if an upgrade can be performed.  See the comment on
+ * data_types_usage_checks below for a more detailed description.
+ */
+typedef struct
+{
+	/* Status line to print to the user */
+	const char *status;
+	/* Filename to store report to */
+	const char *report_filename;
+	/* Query to extract the oid of the datatype */
+	const char *base_query;
+	/* Text to store to report in case of error */
+	const char *report_text;
+	/* The latest version where the check applies */
+	int			threshold_version;
+	/* A function pointer for determining if the check applies */
+	DataTypesUsageVersionCheck version_hook;
+}			DataTypesUsageChecks;
+
+/*
+ * Special values for threshold_version for indicating that a check applies to
+ * all versions, or that a custom function needs to be invoked to determine
+ * if the check applies.
+ */
+#define MANUAL_CHECK 1
+#define ALL_VERSIONS -1
+
+/*--
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and functionality for deciding if the check is applicable to the version
+ * of the old cluster. The struct members are described in detail below:
+ *
+ * status				A oneline string which can be printed to the user to
+ *						inform about progress. Should not end with newline.
+ * report_filename		The filename in which the list of problems detected by
+ *						the check will be printed.
+ * base_query			A query which extracts the Oid of the datatype checked
+ *						for.
+ * report_text			The text which will be printed to the user to explain
+ *						what the check did, and why it failed. The text should
+ *						end with a newline, and does not need to refer to the
+ *						report_filename as that is automatically appended to
+ *						the report with the path to the log folder.
+ * threshold_version	The major version of PostgreSQL for which to run the
+ *						check. Iff the old cluster is less than, or equal to,
+ *						the threshold version then the check will be executed.
+ *						If the old version is greater than the threshold then
+ *						the check is skipped. If the threshold_version is set
+ *						to ALL_VERSIONS then it will be run unconditionally,
+ *						if set to MANUAL_CHECK then the version_hook function
+ *						will be executed in order to determine whether or not
+ *						to run.
+ * version_hook			A function pointer to a version check function of type
+ *						DataTypesUsageVersionCheck which is used to determine
+ *						if the check is applicable to the old cluster. If the
+ *						version_hook returns true then the check will be run,
+ *						else it will be skipped. The function will only be
+ *						executed iff threshold_version is set to MANUAL_CHECK.
+ */
+static DataTypesUsageChecks data_types_usage_checks[] =
+{
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{
+		.status = gettext_noop("Checking for system-defined composite types in user tables"),
+			.report_filename = "tables_using_composite.txt",
+			.base_query =
+			"SELECT t.oid FROM pg_catalog.pg_type t "
+			"LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+			" WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+			.report_text =
+			gettext_noop("Your installation contains system-defined composite types in user tables.\n"
+						 "These type OIDs are not stable across PostgreSQL versions,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns and restart the upgrade.\n"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * 9.3 -> 9.4 Fully implement the 'line' data type in 9.4, which
+	 * previously returned "not enabled" by default and was only functionally
+	 * enabled with a compile-time switch; as of 9.4 "line" has a different
+	 * on-disk representation format.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"line\" data type"),
+			.report_filename = "tables_using_line.txt",
+			.base_query =
+			"SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"line\" data type in user tables.\n"
+						 "this data type changed its internal and input/output format\n"
+						 "between your old and new versions so this\n"
+						 "cluster cannot currently be upgraded.  You can\n"
+						 "drop the problem columns and restart the upgrade.\n"),
+			.threshold_version = 903
+	},
+
+	/*
+	 * pg_upgrade only preserves these system values: pg_class.oid pg_type.oid
+	 * pg_enum.oid
+	 *
+	 * Many of the reg* data types reference system catalog info that is not
+	 * preserved, and hence these data types cannot be used in user tables
+	 * upgraded by pg_upgrade.
+	 */
+	{
+		.status = gettext_noop("Checking for reg* data types in user tables"),
+			.report_filename = "tables_using_reg.txt",
+
+		/*
+		 * Note: older servers will not have all of these reg* types, so we
+		 * have to write the query like this rather than depending on casts to
+		 * regtype.
+		 */
+			.base_query =
+			"SELECT oid FROM pg_catalog.pg_type t "
+			"WHERE t.typnamespace = "
+			"        (SELECT oid FROM pg_catalog.pg_namespace "
+			"         WHERE nspname = 'pg_catalog') "
+			"  AND t.typname IN ( "
+		/* pg_class.oid is preserved, so 'regclass' is OK */
+			"           'regcollation', "
+			"           'regconfig', "
+			"           'regdictionary', "
+			"           'regnamespace', "
+			"           'regoper', "
+			"           'regoperator', "
+			"           'regproc', "
+			"           'regprocedure' "
+		/* pg_authid.oid is preserved, so 'regrole' is OK */
+		/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
+			"         )",
+			.report_text =
+			gettext_noop("Your installation contains one of the reg* data types in user tables.\n"
+						 "These data types reference system OIDs that are not preserved by\n"
+						 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
+						 "drop the problem columns and restart the upgrade.\n"),
+			.threshold_version = ALL_VERSIONS
+	},
+
+	/*
+	 * PG 16 increased the size of the 'aclitem' type, which breaks the
+	 * on-disk format for existing data.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"aclitem\" data type"),
+			.report_filename = "tables_using_aclitem.txt",
+			.base_query =
+			"SELECT 'pg_catalog.aclitem'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"aclitem\" data type in user tables.\n"
+						 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns and restart the upgrade.\n"),
+			.threshold_version = 1500
+	},
+
+	/*
+	 * It's no longer allowed to create tables or views with "unknown"-type
+	 * columns.  We do not complain about views with such columns, because
+	 * they should get silently converted to "text" columns during the DDL
+	 * dump and reload; it seems unlikely to be worth making users do that by
+	 * hand.  However, if there's a table with such a column, the DDL reload
+	 * will fail, so we should pre-detect that rather than failing
+	 * mid-upgrade.  Worse, if there's a matview with such a column, the DDL
+	 * reload will silently change it to "text" which won't match the on-disk
+	 * storage (which is like "cstring").  So we *must* reject that.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"unknown\" user columns"),
+			.report_filename = "tables_using_unknown.txt",
+			.base_query =
+			"SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"unknown\" data type in user tables.\n"
+						 "This data type is no longer allowed in tables, so this cluster\n"
+						 "cannot currently be upgraded.  You can drop the problem columns\n"
+						 "and restart the upgrade.\n"),
+			.threshold_version = 906
+	},
+
+	/*
+	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
+	 * not varchar, which breaks on-disk format for existing data. So we need
+	 * to prevent upgrade when used in user objects (tables, indexes, ...). In
+	 * 12, the sql_identifier data type was switched from name to varchar,
+	 * which does affect the storage (name is by-ref, but not varlena). This
+	 * means user tables using sql_identifier for columns are broken because
+	 * the on-disk format is different.
+	 */
+	{
+		.status = gettext_noop("Checking for invalid \"sql_identifier\" user columns"),
+			.report_filename = "tables_using_sql_identifier.txt",
+			.base_query =
+			"SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"sql_identifier\" data type in user tables.\n"
+						 "The on-disk format for this data type has changed, so this\n"
+						 "cluster cannot currently be upgraded.  You can drop the problem\n"
+						 "columns and restart the upgrade.\n"),
+			.threshold_version = 1100
+	},
+
+	/*
+	 * JSONB changed its storage format during 9.4 beta, so check for it.
+	 */
+	{
+		.status = gettext_noop("Checking for incompatible \"jsonb\" data type in user tables"),
+			.report_filename = "tables_using_jsonb.txt",
+			.base_query =
+			"SELECT 'pg_catalog.jsonb'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"jsonb\" data type in user tables.\n"
+						 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
+						 "cluster cannot currently be upgraded.  You can drop the problem \n"
+						 "columns and restart the upgrade.\n"),
+			.threshold_version = MANUAL_CHECK,
+			.version_hook = jsonb_9_4_check_applicable
+	},
+
+	/*
+	 * PG 12 removed types abstime, reltime, tinterval.
+	 */
+	{
+		.status = gettext_noop("Checking for removed \"abstime\" data type in user tables"),
+			.report_filename = "tables_using_abstime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.abstime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"abstime\" data type in user tables.\n"
+						 "The \"abstime\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"reltime\" data type in user tables"),
+			.report_filename = "tables_using_reltime.txt",
+			.base_query =
+			"SELECT 'pg_catalog.reltime'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"reltime\" data type in user tables.\n"
+						 "The \"reltime\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+	{
+		.status = gettext_noop("Checking for removed \"tinterval\" data type in user tables"),
+			.report_filename = "tables_using_tinterval.txt",
+			.base_query =
+			"SELECT 'pg_catalog.tinterval'::pg_catalog.regtype AS oid",
+			.report_text =
+			gettext_noop("Your installation contains the \"tinterval\" data type in user tables.\n"
+						 "The \"tinterval\" type has been removed in PostgreSQL version 12,\n"
+						 "so this cluster cannot currently be upgraded.  You can drop the\n"
+						 "problem columns, or change them to another data type, and restart\n"
+						 "the upgrade.\n"),
+			.threshold_version = 1100
+	},
+
+	/* End of checks marker, must remain last */
+	{
+		NULL, NULL, NULL, NULL, 0, NULL
+	}
+};
+
+/*
+ * check_for_data_types_usage()
+ *	Detect whether there are any stored columns depending on given type(s)
+ *
+ * If so, write a report to the given file name and signal a failure to the
+ * user.
+ *
+ * The checks to run are defined in a DataTypesUsageChecks structure where
+ * each check has a metadata for explaining errors to the user, a base_query,
+ * a report filename and a function pointer hook for validating if the check
+ * should be executed given the cluster at hand.
+ *
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
+ */
+static void
+check_for_data_types_usage(ClusterInfo *cluster, DataTypesUsageChecks * checks)
+{
+	bool		found = false;
+	bool	   *results;
+	PQExpBufferData report;
+	DataTypesUsageChecks *tmp = checks;
+	int			n_data_types_usage_checks = 0;
+
+	prep_status("Checking for data type usage");
+
+	/* Gather number of checks to perform */
+	while (tmp->status != NULL)
+	{
+		n_data_types_usage_checks++;
+		tmp++;
+	}
+
+	/* Prepare an array to store the results of checks in */
+	results = pg_malloc0(sizeof(bool) * n_data_types_usage_checks);
+
+	/*
+	 * Connect to each database in the cluster and run all defined checks
+	 * against that database before trying the next one.
+	 */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		for (int checknum = 0; checknum < n_data_types_usage_checks; checknum++)
+		{
+			PGresult   *res;
+			int			ntups;
+			int			i_nspname;
+			int			i_relname;
+			int			i_attname;
+			FILE	   *script = NULL;
+			bool		db_used = false;
+			char		output_path[MAXPGPATH];
+			DataTypesUsageChecks *cur_check = &checks[checknum];
+
+			if (cur_check->threshold_version == MANUAL_CHECK)
+			{
+				Assert(cur_check->version_hook);
+
+				/*
+				 * Make sure that the check applies to the current cluster
+				 * version and skip if not. If no check hook has been defined
+				 * we run the check for all versions.
+				 */
+				if (!cur_check->version_hook(cluster))
+					continue;
+			}
+			else if (cur_check->threshold_version != ALL_VERSIONS)
+			{
+				if (GET_MAJOR_VERSION(cluster->major_version) > cur_check->threshold_version)
+					continue;
+			}
+			else
+				Assert(cur_check->threshold_version == ALL_VERSIONS);
+
+			snprintf(output_path, sizeof(output_path), "%s/%s",
+					 log_opts.basedir,
+					 cur_check->report_filename);
+
+			/*
+			 * The type(s) of interest might be wrapped in a domain, array,
+			 * composite, or range, and these container types can be nested
+			 * (to varying extents depending on server version, but that's not
+			 * of concern here).  To handle all these cases we need a
+			 * recursive CTE.
+			 */
+			res = executeQueryOrDie(conn,
+									"WITH RECURSIVE oids AS ( "
+			/* start with the type(s) returned by base_query */
+									"	%s "
+									"	UNION ALL "
+									"	SELECT * FROM ( "
+			/* inner WITH because we can only reference the CTE once */
+									"		WITH x AS (SELECT oid FROM oids) "
+			/* domains on any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
+									"			UNION ALL "
+			/* arrays over any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
+									"			UNION ALL "
+			/* composite types containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
+									"			WHERE t.typtype = 'c' AND "
+									"				  t.oid = c.reltype AND "
+									"				  c.oid = a.attrelid AND "
+									"				  NOT a.attisdropped AND "
+									"				  a.atttypid = x.oid "
+									"			UNION ALL "
+			/* ranges containing any type selected so far */
+									"			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+									"			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
+									"	) foo "
+									") "
+			/* now look for stored columns of any such type */
+									"SELECT n.nspname, c.relname, a.attname "
+									"FROM	pg_catalog.pg_class c, "
+									"		pg_catalog.pg_namespace n, "
+									"		pg_catalog.pg_attribute a "
+									"WHERE	c.oid = a.attrelid AND "
+									"		NOT a.attisdropped AND "
+									"		a.atttypid IN (SELECT oid FROM oids) AND "
+									"		c.relkind IN ("
+									CppAsString2(RELKIND_RELATION) ", "
+									CppAsString2(RELKIND_MATVIEW) ", "
+									CppAsString2(RELKIND_INDEX) ") AND "
+									"		c.relnamespace = n.oid AND "
+			/* exclude possible orphaned temp tables */
+									"		n.nspname !~ '^pg_temp_' AND "
+									"		n.nspname !~ '^pg_toast_temp_' AND "
+			/* exclude system catalogs, too */
+									"		n.nspname NOT IN ('pg_catalog', 'information_schema')",
+									cur_check->base_query);
+
+			ntups = PQntuples(res);
+
+			/*
+			 * The datatype was found, so extract the data and log to the
+			 * requested filename. We need to open the file for appending
+			 * since the check might have already found the type in another
+			 * database earlier in the loop.
+			 */
+			if (ntups)
+			{
+				/*
+				 * Make sure we have a buffer to save reports to now that we
+				 * found a first failing check.
+				 */
+				if (!found)
+					initPQExpBuffer(&report);
+				found = true;
+
+				/*
+				 * If this is the first time we see an error for the check in
+				 * question then print a status message of the failure.
+				 */
+				if (!results[checknum])
+				{
+					pg_log(PG_REPORT, "    failed check: %s", _(cur_check->status));
+					appendPQExpBuffer(&report, "\n%s\n%s    %s\n",
+									  _(cur_check->report_text),
+									  _("A list of the problem columns is in the file:"),
+									  output_path);
+				}
+				results[checknum] = true;
+
+				i_nspname = PQfnumber(res, "nspname");
+				i_relname = PQfnumber(res, "relname");
+				i_attname = PQfnumber(res, "attname");
+
+				for (int rowno = 0; rowno < ntups; rowno++)
+				{
+					if (script == NULL && (script = fopen_priv(output_path, "a")) == NULL)
+						pg_fatal("could not open file \"%s\": %m", output_path);
+
+					if (!db_used)
+					{
+						fprintf(script, "In database: %s\n", active_db->db_name);
+						db_used = true;
+					}
+					fprintf(script, "  %s.%s.%s\n",
+							PQgetvalue(res, rowno, i_nspname),
+							PQgetvalue(res, rowno, i_relname),
+							PQgetvalue(res, rowno, i_attname));
+				}
+
+				if (script)
+				{
+					fclose(script);
+					script = NULL;
+				}
+			}
+
+			PQclear(res);
+		}
+
+		PQfinish(conn);
+	}
+
+	if (found)
+		pg_fatal("Data type checks failed: %s", report.data);
+
+	check_ok();
+}
 
 /*
  * fix_path_separator
@@ -110,8 +595,6 @@ check_and_dump_old_cluster(bool live_check)
 	check_is_install_user(&old_cluster);
 	check_proper_datallowconn(&old_cluster);
 	check_for_prepared_transactions(&old_cluster);
-	check_for_composite_data_type_usage(&old_cluster);
-	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);
 
 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
@@ -129,22 +612,7 @@ check_and_dump_old_cluster(bool live_check)
 		check_old_cluster_subscription_state();
 	}
 
-	/*
-	 * PG 16 increased the size of the 'aclitem' type, which breaks the
-	 * on-disk format for existing data.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
-		check_for_aclitem_data_type_usage(&old_cluster);
-
-	/*
-	 * PG 12 removed types abstime, reltime, tinterval.
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-	{
-		check_for_removed_data_type_usage(&old_cluster, "12", "abstime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "reltime");
-		check_for_removed_data_type_usage(&old_cluster, "12", "tinterval");
-	}
+	check_for_data_types_usage(&old_cluster, data_types_usage_checks);
 
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
@@ -176,21 +644,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(&old_cluster);
 
-	/*
-	 * PG 12 changed the 'sql_identifier' type storage to be based on name,
-	 * not varchar, which breaks on-disk format for existing data. So we need
-	 * to prevent upgrade when used in user objects (tables, indexes, ...).
-	 */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
-		old_11_check_for_sql_identifier_data_type_usage(&old_cluster);
-
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
 	 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 	{
-		old_9_6_check_for_unknown_data_type_usage(&old_cluster);
 		if (user_opts.check)
 			old_9_6_invalidate_hash_indexes(&old_cluster, true);
 	}
@@ -199,14 +658,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
-	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
-		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
-		check_for_jsonb_9_4_usage(&old_cluster);
-
-	/* Pre-PG 9.4 had a different 'line' data type internal format */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
-		old_9_3_check_for_line_data_type_usage(&old_cluster);
-
 	/*
 	 * While not a check option, we do this now because this is the only time
 	 * the old server is running.
@@ -1122,220 +1573,6 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }
 
 
-/*
- * check_for_composite_data_type_usage()
- *	Check for system-defined composite types used in user tables.
- *
- *	The OIDs of rowtypes of system catalogs and information_schema views
- *	can change across major versions; unlike user-defined types, we have
- *	no mechanism for forcing them to be the same in the new cluster.
- *	Hence, if any user table uses one, that's problematic for pg_upgrade.
- */
-static void
-check_for_composite_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	Oid			firstUserOid;
-	char		output_path[MAXPGPATH];
-	char	   *base_query;
-
-	prep_status("Checking for system-defined composite types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_composite.txt");
-
-	/*
-	 * Look for composite types that were made during initdb *or* belong to
-	 * information_schema; that's important in case information_schema was
-	 * dropped and reloaded.
-	 *
-	 * The cutoff OID here should match the source cluster's value of
-	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
-	 * because, if that #define is ever changed, our own version's value is
-	 * NOT what to use.  Eventually we may need a test on the source cluster's
-	 * version to select the correct value.
-	 */
-	firstUserOid = 16384;
-
-	base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
-						  "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
-						  " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
-						  firstUserOid);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains system-defined composite types in user tables.\n"
-				 "These type OIDs are not stable across PostgreSQL versions,\n"
-				 "so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_reg_data_type_usage()
- *	pg_upgrade only preserves these system values:
- *		pg_class.oid
- *		pg_type.oid
- *		pg_enum.oid
- *
- *	Many of the reg* data types reference system catalog info that is
- *	not preserved, and hence these data types cannot be used in user
- *	tables upgraded by pg_upgrade.
- */
-static void
-check_for_reg_data_type_usage(ClusterInfo *cluster)
-{
-	bool		found;
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for reg* data types in user tables");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_reg.txt");
-
-	/*
-	 * Note: older servers will not have all of these reg* types, so we have
-	 * to write the query like this rather than depending on casts to regtype.
-	 */
-	found = check_for_data_types_usage(cluster,
-									   "SELECT oid FROM pg_catalog.pg_type t "
-									   "WHERE t.typnamespace = "
-									   "        (SELECT oid FROM pg_catalog.pg_namespace "
-									   "         WHERE nspname = 'pg_catalog') "
-									   "  AND t.typname IN ( "
-	/* pg_class.oid is preserved, so 'regclass' is OK */
-									   "           'regcollation', "
-									   "           'regconfig', "
-									   "           'regdictionary', "
-									   "           'regnamespace', "
-									   "           'regoper', "
-									   "           'regoperator', "
-									   "           'regproc', "
-									   "           'regprocedure' "
-	/* pg_authid.oid is preserved, so 'regrole' is OK */
-	/* pg_type.oid is (mostly) preserved, so 'regtype' is OK */
-									   "         )",
-									   output_path);
-
-	if (found)
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains one of the reg* data types in user tables.\n"
-				 "These data types reference system OIDs that are not preserved by\n"
-				 "pg_upgrade, so this cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_aclitem_data_type_usage
- *
- *	aclitem changed its storage format in 16, so check for it.
- */
-static void
-check_for_aclitem_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"%s\" data type in user tables",
-				"aclitem");
-
-	snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.aclitem", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"aclitem\" data type in user tables.\n"
-				 "The internal format of \"aclitem\" changed in PostgreSQL version 16\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns and restart the upgrade.  A list of the problem\n"
-				 "columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-/*
- * check_for_removed_data_type_usage
- *
- *	Check for in-core data types that have been removed.  Callers know
- *	the exact list.
- */
-static void
-check_for_removed_data_type_usage(ClusterInfo *cluster, const char *version,
-								  const char *datatype)
-{
-	char		output_path[MAXPGPATH];
-	char		typename[NAMEDATALEN];
-
-	prep_status("Checking for removed \"%s\" data type in user tables",
-				datatype);
-
-	snprintf(output_path, sizeof(output_path), "tables_using_%s.txt",
-			 datatype);
-	snprintf(typename, sizeof(typename), "pg_catalog.%s", datatype);
-
-	if (check_for_data_type_usage(cluster, typename, output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"%s\" data type in user tables.\n"
-				 "The \"%s\" type has been removed in PostgreSQL version %s,\n"
-				 "so this cluster cannot currently be upgraded.  You can drop the\n"
-				 "problem columns, or change them to another data type, and restart\n"
-				 "the upgrade.  A list of the problem columns is in the file:\n"
-				 "    %s", datatype, datatype, version, output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * check_for_jsonb_9_4_usage()
- *
- *	JSONB changed its storage format during 9.4 beta, so check for it.
- */
-static void
-check_for_jsonb_9_4_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"jsonb\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_jsonb.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
-				 "The internal format of \"jsonb\" changed during 9.4 beta so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
 /*
  * check_for_pg_role_prefix()
  *
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index c0bfb002d2..92bcb693fb 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -352,6 +352,9 @@ typedef struct
 } OSInfo;
 
 
+/* Function signature for data type check version hook */
+typedef bool (*DataTypesUsageVersionCheck) (ClusterInfo *cluster);
+
 /*
  * Global variables
  */
@@ -479,18 +482,10 @@ unsigned int str2uint(const char *str);
 
 /* version.c */
 
-bool		check_for_data_types_usage(ClusterInfo *cluster,
-									   const char *base_query,
-									   const char *output_path);
-bool		check_for_data_type_usage(ClusterInfo *cluster,
-									  const char *type_name,
-									  const char *output_path);
-void		old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
-void		old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
+bool		jsonb_9_4_check_applicable(ClusterInfo *cluster);
 void		old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
 											bool check_mode);
 
-void		old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
 void		report_extension_updates(ClusterInfo *cluster);
 
 /* parallel.c */
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 9dc1399f36..2de6dffccd 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -9,235 +9,23 @@
 
 #include "postgres_fe.h"
 
-#include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
-
 /*
- * check_for_data_types_usage()
- *	Detect whether there are any stored columns depending on given type(s)
- *
- * If so, write a report to the given file name, and return true.
- *
- * base_query should be a SELECT yielding a single column named "oid",
- * containing the pg_type OIDs of one or more types that are known to have
- * inconsistent on-disk representations across server versions.
- *
- * We check for the type(s) in tables, matviews, and indexes, but not views;
- * there's no storage involved in a view.
+ * version_hook functions for check_for_data_types_usage in order to determine
+ * whether a data type check should be executed for the cluster in question or
+ * not.
  */
 bool
-check_for_data_types_usage(ClusterInfo *cluster,
-						   const char *base_query,
-						   const char *output_path)
+jsonb_9_4_check_applicable(ClusterInfo *cluster)
 {
-	bool		found = false;
-	FILE	   *script = NULL;
-	int			dbnum;
+	/* JSONB changed its storage format during 9.4 beta */
+	if (GET_MAJOR_VERSION(cluster->major_version) == 904 &&
+		cluster->controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
+		return true;
 
-	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-	{
-		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
-		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
-		PQExpBufferData querybuf;
-		PGresult   *res;
-		bool		db_used = false;
-		int			ntups;
-		int			rowno;
-		int			i_nspname,
-					i_relname,
-					i_attname;
-
-		/*
-		 * The type(s) of interest might be wrapped in a domain, array,
-		 * composite, or range, and these container types can be nested (to
-		 * varying extents depending on server version, but that's not of
-		 * concern here).  To handle all these cases we need a recursive CTE.
-		 */
-		initPQExpBuffer(&querybuf);
-		appendPQExpBuffer(&querybuf,
-						  "WITH RECURSIVE oids AS ( "
-		/* start with the type(s) returned by base_query */
-						  "	%s "
-						  "	UNION ALL "
-						  "	SELECT * FROM ( "
-		/* inner WITH because we can only reference the CTE once */
-						  "		WITH x AS (SELECT oid FROM oids) "
-		/* domains on any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' "
-						  "			UNION ALL "
-		/* arrays over any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype = 'b' "
-						  "			UNION ALL "
-		/* composite types containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x "
-						  "			WHERE t.typtype = 'c' AND "
-						  "				  t.oid = c.reltype AND "
-						  "				  c.oid = a.attrelid AND "
-						  "				  NOT a.attisdropped AND "
-						  "				  a.atttypid = x.oid "
-						  "			UNION ALL "
-		/* ranges containing any type selected so far */
-						  "			SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
-						  "			WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid"
-						  "	) foo "
-						  ") "
-		/* now look for stored columns of any such type */
-						  "SELECT n.nspname, c.relname, a.attname "
-						  "FROM	pg_catalog.pg_class c, "
-						  "		pg_catalog.pg_namespace n, "
-						  "		pg_catalog.pg_attribute a "
-						  "WHERE	c.oid = a.attrelid AND "
-						  "		NOT a.attisdropped AND "
-						  "		a.atttypid IN (SELECT oid FROM oids) AND "
-						  "		c.relkind IN ("
-						  CppAsString2(RELKIND_RELATION) ", "
-						  CppAsString2(RELKIND_MATVIEW) ", "
-						  CppAsString2(RELKIND_INDEX) ") AND "
-						  "		c.relnamespace = n.oid AND "
-		/* exclude possible orphaned temp tables */
-						  "		n.nspname !~ '^pg_temp_' AND "
-						  "		n.nspname !~ '^pg_toast_temp_' AND "
-		/* exclude system catalogs, too */
-						  "		n.nspname NOT IN ('pg_catalog', 'information_schema')",
-						  base_query);
-
-		res = executeQueryOrDie(conn, "%s", querybuf.data);
-
-		ntups = PQntuples(res);
-		i_nspname = PQfnumber(res, "nspname");
-		i_relname = PQfnumber(res, "relname");
-		i_attname = PQfnumber(res, "attname");
-		for (rowno = 0; rowno < ntups; rowno++)
-		{
-			found = true;
-			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-				pg_fatal("could not open file \"%s\": %m", output_path);
-			if (!db_used)
-			{
-				fprintf(script, "In database: %s\n", active_db->db_name);
-				db_used = true;
-			}
-			fprintf(script, "  %s.%s.%s\n",
-					PQgetvalue(res, rowno, i_nspname),
-					PQgetvalue(res, rowno, i_relname),
-					PQgetvalue(res, rowno, i_attname));
-		}
-
-		PQclear(res);
-
-		termPQExpBuffer(&querybuf);
-
-		PQfinish(conn);
-	}
-
-	if (script)
-		fclose(script);
-
-	return found;
-}
-
-/*
- * check_for_data_type_usage()
- *	Detect whether there are any stored columns depending on the given type
- *
- * If so, write a report to the given file name, and return true.
- *
- * type_name should be a fully qualified type name.  This is just a
- * trivial wrapper around check_for_data_types_usage() to convert a
- * type name into a base query.
- */
-bool
-check_for_data_type_usage(ClusterInfo *cluster,
-						  const char *type_name,
-						  const char *output_path)
-{
-	bool		found;
-	char	   *base_query;
-
-	base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
-						  type_name);
-
-	found = check_for_data_types_usage(cluster, base_query, output_path);
-
-	free(base_query);
-
-	return found;
-}
-
-
-/*
- * old_9_3_check_for_line_data_type_usage()
- *	9.3 -> 9.4
- *	Fully implement the 'line' data type in 9.4, which previously returned
- *	"not enabled" by default and was only functionally enabled with a
- *	compile-time switch; as of 9.4 "line" has a different on-disk
- *	representation format.
- */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for incompatible \"line\" data type");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_line.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"line\" data type in user tables.\n"
-				 "This data type changed its internal and input/output format\n"
-				 "between your old and new versions so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
-/*
- * old_9_6_check_for_unknown_data_type_usage()
- *	9.6 -> 10
- *	It's no longer allowed to create tables or views with "unknown"-type
- *	columns.  We do not complain about views with such columns, because
- *	they should get silently converted to "text" columns during the DDL
- *	dump and reload; it seems unlikely to be worth making users do that
- *	by hand.  However, if there's a table with such a column, the DDL
- *	reload will fail, so we should pre-detect that rather than failing
- *	mid-upgrade.  Worse, if there's a matview with such a column, the
- *	DDL reload will silently change it to "text" which won't match the
- *	on-disk storage (which is like "cstring").  So we *must* reject that.
- */
-void
-old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"unknown\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_unknown.txt");
-
-	if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"unknown\" data type in user tables.\n"
-				 "This data type is no longer allowed in tables, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
+	return false;
 }
 
 /*
@@ -351,41 +139,6 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
 		check_ok();
 }
 
-/*
- * old_11_check_for_sql_identifier_data_type_usage()
- *	11 -> 12
- *	In 12, the sql_identifier data type was switched from name to varchar,
- *	which does affect the storage (name is by-ref, but not varlena). This
- *	means user tables using sql_identifier for columns are broken because
- *	the on-disk format is different.
- */
-void
-old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
-{
-	char		output_path[MAXPGPATH];
-
-	prep_status("Checking for invalid \"sql_identifier\" user columns");
-
-	snprintf(output_path, sizeof(output_path), "%s/%s",
-			 log_opts.basedir,
-			 "tables_using_sql_identifier.txt");
-
-	if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
-								  output_path))
-	{
-		pg_log(PG_REPORT, "fatal");
-		pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables.\n"
-				 "The on-disk format for this data type has changed, so this\n"
-				 "cluster cannot currently be upgraded.  You can\n"
-				 "drop the problem columns and restart the upgrade.\n"
-				 "A list of the problem columns is in the file:\n"
-				 "    %s", output_path);
-	}
-	else
-		check_ok();
-}
-
-
 /*
  * report_extension_updates()
  *	Report extensions that should be updated.
-- 
2.32.1 (Apple Git-133)

#33

Peter Eisentraut

peter@eisentraut.org

almost 2 years ago

In reply to: Daniel Gustafsson (#32)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 18.03.24 13:11, Daniel Gustafsson wrote:

Attached is a fresh rebase with only minor cosmetic touch-ups which I would
like to go ahead with during this CF.

Peter: does this address the comments you had on translation and code
duplication?

Yes, this looks good.

#34

Daniel Gustafsson

daniel@yesql.se

almost 2 years ago

In reply to: Peter Eisentraut (#33)

Re: Reducing connection overhead in pg_upgrade compat check phase

On 19 Mar 2024, at 08:07, Peter Eisentraut <peter@eisentraut.org> wrote:

On 18.03.24 13:11, Daniel Gustafsson wrote:

Attached is a fresh rebase with only minor cosmetic touch-ups which I would
like to go ahead with during this CF.
Peter: does this address the comments you had on translation and code
duplication?

Yes, this looks good.

Thanks for review! I took another look at this and pushed it.

--
Daniel Gustafsson