[18] separate collation and ctype versions, and cleanup of pg_database locale fields

Started by Jeff Davisover 1 year ago3 messages
#1Jeff Davis
pgsql@j-davis.com

Definitions:

- collation is text ordering and comparison
- ctype affects case mapping (e.g. LOWER()) and pattern
matching/regexes

Currently, there is only one version field, and it represents the
version of the collation. So, if your provider is libc and datcollate
is "C" and datctype is "en_US.utf8", then the datcollversion will
always be NULL. Other providers use datcolllocale, which is only one
field, so it doesn't matter.

Given the discussion here:

/messages/by-id/1078884.1721762815@sss.pgh.pa.us

it seems like it may be a good idea to version collation and ctype
separately. The ctype version is, more or less, the Unicode version,
and we know what that is for the builtin provider as well as ICU.

(Aside: ICU could theoretically report the same Unicode version and
still make some change that would affect us, but I have not observed
that to be the case. I use exhaustive code point coverage to test that
our Unicode functions return the same results as the corresponding ICU
functions when the Unicode version matches.)

Adding more collation fields is getting to be messy, though, because
they all have to be present in pg_database, as well. It's hard to move
those fields into pg_collation, because that's not a shared catalog, so
that could cause problems with CREATE/ALTER DATABASE. Is it worth
thinking about how we can clean this up, or should we just put up with
the idea that almost half the fields in pg_database will be locale-
related?

Regards,
Jeff Davis

#2Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#1)
1 attachment(s)
Re: [18] separate collation and ctype versions, and cleanup of pg_database locale fields

On Thu, 2024-07-25 at 13:29 -0700, Jeff Davis wrote:

it may be a good idea to version collation and ctype
separately. The ctype version is, more or less, the Unicode version,
and we know what that is for the builtin provider as well as ICU.

Attached a rough patch for the purposes of discussion. It tracks the
ctype version separately, but doesn't do anything with it yet.

The main problem is that it's one more slightly confusing thing to
understand, especially in pg_database because it's the ctype version of
the database default collation, not necessarily datctype.

Maybe we can do something with the naming or catalog representation to
make this more clear?

Regards,
Jeff Davis

Attachments:

v1-0001-Add-datctypeversion-and-collctypeversion.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-datctypeversion-and-collctypeversion.patchDownload
From c4f637cebf96c9243ee866e15066b67838745c58 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 26 Jul 2024 17:28:02 -0700
Subject: [PATCH v1] Add datctypeversion and collctypeversion.

The ctype version can be distinct from the collation version in some
cases, so track it separately in the catalog. For the builtin
"C.UTF-8" locale, the ctype version is the version of Unicode that
Postgres was built with. For ICU, the ctype version is the version of
Unicode that ICU was built with. For libc, it's the same as the
collation version.

These catalog fields are not used yet, but maintain them for the
future.
---
 src/backend/catalog/pg_collation.c   |   5 +
 src/backend/commands/collationcmds.c |  95 ++++++++++++++++
 src/backend/commands/dbcommands.c    | 158 ++++++++++++++++++++++++---
 src/backend/utils/adt/pg_locale.c    |  38 +++++++
 src/bin/initdb/initdb.c              |   2 +
 src/include/catalog/pg_collation.h   |   2 +
 src/include/catalog/pg_database.h    |   3 +
 src/include/catalog/pg_proc.dat      |  10 ++
 src/include/utils/pg_locale.h        |   1 +
 9 files changed, 297 insertions(+), 17 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 7f2f7012299..822bccbd715 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -48,6 +48,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				const char *colllocale,
 				const char *collicurules,
 				const char *collversion,
+				const char *collctypeversion,
 				bool if_not_exists,
 				bool quiet)
 {
@@ -202,6 +203,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
 		nulls[Anum_pg_collation_collversion - 1] = true;
+	if (collctypeversion)
+		values[Anum_pg_collation_collctypeversion - 1] = CStringGetTextDatum(collctypeversion);
+	else
+		nulls[Anum_pg_collation_collctypeversion - 1] = true;
 
 	tup = heap_form_tuple(tupDesc, values, nulls);
 
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 63ef9a08411..d8525dbfceb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,6 +64,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *deterministicEl = NULL;
 	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
+	DefElem    *ctypeversionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	const char *colllocale;
@@ -72,6 +73,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	int			collencoding;
 	char		collprovider;
 	char	   *collversion = NULL;
+	char	   *collctypeversion = NULL;
 	Oid			newoid;
 	ObjectAddress address;
 
@@ -103,6 +105,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
+		else if (strcmp(defel->defname, "ctype_version") == 0)
+			defelp = &ctypeversionEl;
 		else
 		{
 			ereport(ERROR,
@@ -211,6 +215,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
+		if (ctypeversionEl)
+			collctypeversion = defGetString(ctypeversionEl);
+
 		if (collproviderstr)
 		{
 			if (pg_strcasecmp(collproviderstr, "builtin") == 0)
@@ -360,6 +367,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collversion = get_collation_actual_version(collprovider, locale);
 	}
 
+	if (!collctypeversion)
+	{
+		const char *locale;
+
+		if (collprovider == COLLPROVIDER_LIBC)
+			locale = collctype;
+		else
+			locale = colllocale;
+
+		collctypeversion = get_ctype_actual_version(collprovider, locale);
+	}
+
 	newoid = CollationCreate(collName,
 							 collNamespace,
 							 GetUserId(),
@@ -371,6 +390,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 colllocale,
 							 collicurules,
 							 collversion,
+							 collctypeversion,
 							 if_not_exists,
 							 false);	/* not quiet */
 
@@ -578,6 +598,77 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 }
 
 
+Datum
+pg_ctype_actual_version(PG_FUNCTION_ARGS)
+{
+	Oid			collid = PG_GETARG_OID(0);
+	char		provider;
+	char	   *locale;
+	char	   *version;
+	Datum		datum;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		/* retrieve from pg_database */
+
+		HeapTuple	dbtup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+
+		if (!HeapTupleIsValid(dbtup))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("database with OID %u does not exist", MyDatabaseId)));
+
+		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
+
+		if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datlocale);
+			locale = TextDatumGetCString(datum);
+		}
+
+		ReleaseSysCache(dbtup);
+	}
+	else
+	{
+		/* retrieve from pg_collation */
+
+		HeapTuple	colltp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+
+		if (!HeapTupleIsValid(colltp))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("collation with OID %u does not exist", collid)));
+
+		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
+		Assert(provider != COLLPROVIDER_DEFAULT);
+
+		if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colllocale);
+			locale = TextDatumGetCString(datum);
+		}
+
+		ReleaseSysCache(colltp);
+	}
+
+	version = get_ctype_actual_version(provider, locale);
+	if (version)
+		PG_RETURN_TEXT_P(cstring_to_text(version));
+	else
+		PG_RETURN_NULL();
+}
+
+
 /* will we use "locale -a" in pg_import_system_collations? */
 #if !defined(WIN32)
 #define READ_LOCALE_A_OUTPUT
@@ -744,6 +835,7 @@ create_collation_from_locale(const char *locale, int nspid,
 							 COLLPROVIDER_LIBC, true, enc,
 							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
+							 get_ctype_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
 	{
@@ -819,6 +911,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 								 COLLPROVIDER_LIBC, true, enc,
 								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
+								 get_ctype_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
 		{
@@ -953,6 +1046,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 									 COLLPROVIDER_LIBC, true, enc,
 									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
+									 get_ctype_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
 			{
@@ -1013,6 +1107,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 									 COLLPROVIDER_ICU, true, -1,
 									 NULL, NULL, langtag, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, langtag),
+									 get_ctype_actual_version(COLLPROVIDER_ICU, langtag),
 									 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 7026352bc99..b8bcfd9f788 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -120,7 +120,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbLocale,
 						char **dbIcurules,
 						char *dbLocProvider,
-						char **dbCollversion);
+						char **dbCollversion,
+						char **dbCtypeversion);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
 static int	errdetail_busy_db(int notherbackends, int npreparedxacts);
@@ -690,6 +691,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
+	char	   *src_ctypeversion = NULL;
 	bool		src_istemplate;
 	bool		src_hasloginevt = false;
 	bool		src_allowconn;
@@ -719,6 +721,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
 	DefElem    *dcollversion = NULL;
+	DefElem	   *dctypeversion = NULL;
 	DefElem    *dstrategy = NULL;
 	char	   *dbname = stmt->dbname;
 	char	   *dbowner = NULL;
@@ -734,6 +737,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	bool		dballowconnections = true;
 	int			dbconnlimit = DATCONNLIMIT_UNLIMITED;
 	char	   *dbcollversion = NULL;
+	char	   *dbctypeversion = NULL;
 	int			notherbackends;
 	int			npreparedxacts;
 	CreateDBStrategy dbstrategy = CREATEDB_WAL_LOG;
@@ -834,6 +838,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			dcollversion = defel;
 		}
+		else if (strcmp(defel->defname, "ctype_version") == 0)
+		{
+			if (dctypeversion)
+				errorConflictingDefElem(defel, pstate);
+			dctypeversion = defel;
+		}
 		else if (strcmp(defel->defname, "location") == 0)
 		{
 			ereport(WARNING,
@@ -957,6 +967,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	}
 	if (dcollversion)
 		dbcollversion = defGetString(dcollversion);
+	if (dctypeversion)
+		dbctypeversion = defGetString(dctypeversion);
 
 	/* obtain OID of proposed owner */
 	if (dbowner)
@@ -995,7 +1007,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_istemplate, &src_allowconn, &src_hasloginevt,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
 					 &src_collate, &src_ctype, &src_locale, &src_icurules, &src_locprovider,
-					 &src_collversion))
+					 &src_collversion, &src_ctypeversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -1270,6 +1282,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dbcollversion == NULL)
 		dbcollversion = src_collversion;
+	if (dbctypeversion == NULL)
+		dbctypeversion = src_ctypeversion;
 
 	/*
 	 * Normally, we copy the collation version from the template database.
@@ -1288,6 +1302,23 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollversion = get_collation_actual_version(dblocprovider, locale);
 	}
 
+	/*
+	 * Normally, we copy the ctype version from the template database.
+	 * This last resort only applies if the template database does not have a
+	 * ctype version, which is normally only the case for template0.
+	 */
+	if (dbctypeversion == NULL)
+	{
+		const char *locale;
+
+		if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbctype;
+		else
+			locale = dblocale;
+
+		dbctypeversion = get_ctype_actual_version(dblocprovider, locale);
+	}
+
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
 	{
@@ -1456,6 +1487,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
 		new_record_nulls[Anum_pg_database_datcollversion - 1] = true;
+	if (dbctypeversion)
+		new_record[Anum_pg_database_datctypeversion - 1] = CStringGetTextDatum(dbctypeversion);
+	else
+		new_record_nulls[Anum_pg_database_datctypeversion - 1] = true;
 
 	/*
 	 * We deliberately set datacl to default (NULL), rather than copying it
@@ -1666,7 +1701,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1881,7 +1916,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1991,7 +2026,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2507,8 +2542,11 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
-	char	   *oldversion;
-	char	   *newversion;
+	bool		changed = false;
+	char	   *oldcollversion;
+	char	   *oldctypeversion;
+	char	   *newcollversion;
+	char	   *newctypeversion;
 
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 	ScanKeyInit(&scankey,
@@ -2531,7 +2569,10 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 					   stmt->dbname);
 
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
-	oldversion = isnull ? NULL : TextDatumGetCString(datum);
+	oldcollversion = isnull ? NULL : TextDatumGetCString(datum);
+
+	datum = heap_getattr(tuple, Anum_pg_database_datctypeversion, RelationGetDescr(rel), &isnull);
+	oldctypeversion = isnull ? NULL : TextDatumGetCString(datum);
 
 	if (datForm->datlocprovider == COLLPROVIDER_LIBC)
 	{
@@ -2546,31 +2587,72 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 			elog(ERROR, "unexpected null in pg_database");
 	}
 
-	newversion = get_collation_actual_version(datForm->datlocprovider,
-											  TextDatumGetCString(datum));
+	newcollversion = get_collation_actual_version(datForm->datlocprovider,
+												  TextDatumGetCString(datum));
+
+	if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datctype, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+	}
+	else
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datlocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+	}
+
+	newctypeversion = get_ctype_actual_version(datForm->datlocprovider,
+											   TextDatumGetCString(datum));
 
 	/* cannot change from NULL to non-NULL or vice versa */
-	if ((!oldversion && newversion) || (oldversion && !newversion))
+	if ((!oldcollversion && newcollversion) || (oldcollversion && !newcollversion))
 		elog(ERROR, "invalid collation version change");
-	else if (oldversion && newversion && strcmp(newversion, oldversion) != 0)
+	if ((!oldctypeversion && newctypeversion) || (oldctypeversion && !newctypeversion))
+		elog(ERROR, "invalid ctype version change");
+
+	if (oldcollversion && newcollversion && strcmp(newcollversion, oldcollversion) != 0)
 	{
 		bool		nulls[Natts_pg_database] = {0};
 		bool		replaces[Natts_pg_database] = {0};
 		Datum		values[Natts_pg_database] = {0};
 
 		ereport(NOTICE,
-				(errmsg("changing version from %s to %s",
-						oldversion, newversion)));
+				(errmsg("changing collation version from %s to %s",
+						oldcollversion, newcollversion)));
 
-		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newversion);
+		values[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(newcollversion);
 		replaces[Anum_pg_database_datcollversion - 1] = true;
 
 		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
 								  values, nulls, replaces);
 		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
 		heap_freetuple(tuple);
+		changed = true;
 	}
-	else
+
+	if (oldctypeversion && newctypeversion && strcmp(newctypeversion, oldctypeversion) != 0)
+	{
+		bool		nulls[Natts_pg_database] = {0};
+		bool		replaces[Natts_pg_database] = {0};
+		Datum		values[Natts_pg_database] = {0};
+
+		ereport(NOTICE,
+				(errmsg("changing ctype version from %s to %s",
+						oldctypeversion, newctypeversion)));
+
+		values[Anum_pg_database_datctypeversion - 1] = CStringGetTextDatum(newctypeversion);
+		replaces[Anum_pg_database_datctypeversion - 1] = true;
+
+		tuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
+								  values, nulls, replaces);
+		CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		heap_freetuple(tuple);
+		changed = true;
+	}
+
+	if (!changed)
 		ereport(NOTICE,
 				(errmsg("version has not changed")));
 
@@ -2758,6 +2840,39 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 }
 
+Datum
+pg_database_ctype_actual_version(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	HeapTuple	tp;
+	char		datlocprovider;
+	Datum		datum;
+	char	   *version;
+
+	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("database with OID %u does not exist", dbid)));
+
+	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
+
+	if (datlocprovider == COLLPROVIDER_LIBC)
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+	else
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datlocale);
+
+	version = get_ctype_actual_version(datlocprovider,
+									   TextDatumGetCString(datum));
+
+	ReleaseSysCache(tp);
+
+	if (version)
+		PG_RETURN_TEXT_P(cstring_to_text(version));
+	else
+		PG_RETURN_NULL();
+}
+
 
 /*
  * Helper functions
@@ -2777,7 +2892,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbLocale,
 			char **dbIcurules,
 			char *dbLocProvider,
-			char **dbCollversion)
+			char **dbCollversion,
+			char **dbCtypeversion)
 {
 	bool		result = false;
 	Relation	relation;
@@ -2909,6 +3025,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbCollversion = TextDatumGetCString(datum);
 				}
+				if (dbCtypeversion)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datctypeversion, &isnull);
+					if (isnull)
+						*dbCtypeversion = NULL;
+					else
+						*dbCtypeversion = TextDatumGetCString(datum);
+				}
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 38c40a40489..ddcec0b7091 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -56,6 +56,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "common/unicode_version.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -1829,6 +1830,43 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+/*
+ * Get provider-specific ctype version.
+ */
+char *
+get_ctype_actual_version(char collprovider, const char *collctype)
+{
+	if (collprovider == COLLPROVIDER_BUILTIN)
+	{
+		if (strcmp(collctype, "C") == 0)
+			return "1";
+		else if (strcmp(collctype, "C.UTF-8") == 0)
+			return PG_UNICODE_VERSION;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+					 errmsg("invalid locale name \"%s\" for builtin provider",
+							collctype)));
+	}
+	else if (collprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		return U_UNICODE_VERSION;
+#else
+		/* could get here if a collation was created by a build with ICU */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#endif
+	}
+	else if (collprovider == COLLPROVIDER_LIBC)
+	{
+		return get_collation_actual_version(collprovider, collctype);
+	}
+
+	return NULL;
+}
+
 /*
  * pg_strncoll_libc_win32_utf8
  *
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a0150..e89df01385d 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1964,11 +1964,13 @@ make_template0(FILE *cmdfd)
 	 * a new database from it.
 	 */
 	PG_CMD_PUTS("UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n");
+	PG_CMD_PUTS("UPDATE pg_database SET datctypeversion = NULL WHERE datname = 'template0';\n\n");
 
 	/*
 	 * While we are here, do set the collation version on template1.
 	 */
 	PG_CMD_PUTS("UPDATE pg_database SET datcollversion = pg_database_collation_actual_version(oid) WHERE datname = 'template1';\n\n");
+	PG_CMD_PUTS("UPDATE pg_database SET datctypeversion = pg_database_ctype_actual_version(oid) WHERE datname = 'template1';\n\n");
 
 	/*
 	 * Explicitly revoke public create-schema and create-temp-table privileges
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 5ce289d74bd..412f8eb24e3 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -47,6 +47,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
+	text		collctypeversion BKI_DEFAULT(_null_);
 #endif
 } FormData_pg_collation;
 
@@ -100,6 +101,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							const char *colllocale,
 							const char *collicurules,
 							const char *collversion,
+							const char *collctypeversion,
 							bool if_not_exists,
 							bool quiet);
 
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index dbd4379ffa5..8d178371f84 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -83,6 +83,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
+	/* provider-dependent version for ctype */
+	text		datctypeversion BKI_DEFAULT(_null_);
+
 	/* access permissions */
 	aclitem		datacl[1];
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 06b2f4ba66c..20bc5c9bea4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12068,11 +12068,21 @@
   proname => 'pg_collation_actual_version', procost => '100',
   provolatile => 'v', prorettype => 'text', proargtypes => 'oid',
   prosrc => 'pg_collation_actual_version' },
+{ oid => '3814',
+  descr => 'get actual version of ctype from locale provider',
+  proname => 'pg_ctype_actual_version', procost => '100',
+  provolatile => 'v', prorettype => 'text', proargtypes => 'oid',
+  prosrc => 'pg_ctype_actual_version' },
 { oid => '6249',
   descr => 'get actual version of database collation from operating system',
   proname => 'pg_database_collation_actual_version', procost => '100',
   provolatile => 'v', prorettype => 'text', proargtypes => 'oid',
   prosrc => 'pg_database_collation_actual_version' },
+{ oid => '6313',
+  descr => 'get actual version of database collation from locale provider',
+  proname => 'pg_database_ctype_actual_version', procost => '100',
+  provolatile => 'v', prorettype => 'text', proargtypes => 'oid',
+  prosrc => 'pg_database_ctype_actual_version' },
 
 # system management/monitoring related functions
 { oid => '3353', descr => 'list files in the log directory',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 040968d6ff2..35cf0b39f47 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -103,6 +103,7 @@ extern bool pg_locale_deterministic(pg_locale_t locale);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern char *get_ctype_actual_version(char collprovider, const char *collctype);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, size_t len1,
 						const char *arg2, size_t len2, pg_locale_t locale);
-- 
2.34.1

#3Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#2)
Re: [18] separate collation and ctype versions, and cleanup of pg_database locale fields

On Sat, 2024-07-27 at 08:34 -0700, Jeff Davis wrote:

Attached a rough patch for the purposes of discussion. It tracks the
ctype version separately, but doesn't do anything with it yet.

I'm withdrawing this patch due to a lack of discussion.

I may make a related proposal for v19.

Regards,
Jeff Davis